220 likes | 310 Vues
Explore a unified trace representation capturing control flow and data dependence to optimize instruction scheduling and fetching. Discover how this approach reduces trace sizes while enhancing compressibility and runtime efficiency.
E N D
Extended Whole Program Paths Sriraman Tallam Rajiv Gupta Xiangyu Zhang University of Arizona
Control Flow and Dependence Traces • Control Flow Traces • Sequence of basic blocks. • Identification of hot paths. • Path Sensitive Instruction Scheduling and Optimization. • Path Prediction and Instruction Fetching. • Dependence Traces • Capture data dependences. • Flow from a definition to a use. • Data Speculative Optimizations for Itanium. • Computation of Dynamic Slices.
Control Flow and Dependence Traces • Control Flow Traces are smaller than Dependence Traces and can be compressed well. • Average size for Spec 2K benchmarks is 179 MB. • Compression Factor • Sequitur – 681 • VPC – 442 • Dependence Traces are large and do not compress as well as Control Flow Traces. • Average size for Spec 2K benchmarks is 565 MB. • Compression Factor • Sequitur – 1.31 • VPC – 5.8 • Is there an alternative trace representation ?
Our Approach • Extended Control Flow Trace – Unified Trace Representation. • Capture both control flow and dependence information. • The data dependences are embedded as control flow. • The unified trace is smaller than control flow + dependence traces. • Our compressed unified trace is also smaller than the compressed control flow + compressed dependence traces.
If p==&X 5 = X 6 Goals in Designing the eCF 1 X = _ • The dependence can now not be recovered due to possible aliasing. • Additional Control Flow can capture the dependence. • The dependence can be recovered from the Control Flow. 3 2 X = _ *p = _ = X 4 4
Cost of Capturing Dependences • No-cost capture • For these dependences, no disambiguation checks are needed. • Fixed cost capture • The number of disambiguation checks needed is a constant. • Variable cost capture. • The number of disambiguation checks varies.
No Cost Capture • All instances of the dependence can be recovered from the control flow trace.
Fixed Cost Capture • A single disambiguation check is sufficient to capture this dependence. Single Check
Variable Cost Capture • The instances of the dependence can be caused by any instance of the definition statement. Multiple Checks
Cost of Instrumentation and Trace Compressibility • Reducing the number of checks • Reducing the size of the generated trace. • Reduction in run-time overhead. • Improving the Compressibility • Similar Control Flow Signatures.
Two Phased Approach • Conservative nature of Static Pointer Analysis. • Too many potential dependences per use. • Two phased Approach • Filtering Phase • Find all dependences exercised. • Profiling Phase • Add disambiguation checks only for those dependences exercised.
Binary Search vs. Linear Search • Track the last definition and instance of every write to a memory address. • Search the address array using binary search instead of linear search.
Experimental Results • Implementation on the Microsoft Phoenix RDK. • Spec 2K benchmark binaries were rewritten to obtain instrumented versions. • Easy to implement using Phoenix. • Intermediate representation was low-level x86 instruction set. • Split dependences into register and memory. • Register dependences are always recoverable from control flow trace. • Memory dependences were recovered using our approach.
Register and Memory dependences • A Significant (76 %) of dependences (register) can be recovered from the control flow trace
Uncompressed Trace Sizes Cont. + Dep. Unified Ratio • The unified trace is 62 % of the size of Control Flow + Dependence Trace
Sequitur Compressed Cont. + Dep. Unified Ratio • The compressed unified trace is 4 % of the size of compressed Control Flow + Dependence Trace
VPC Compressed Cont. + Dep. Unified Ratio • The compressed unified trace is 21 % of the size of compressed Control Flow + Dependence Trace
Memory Dependence Types • 30 % of dependences can be recovered at no cost.
Address Comparisons • Binary Search reduces the address comparisons by 4 orders of magnitude.
Run-time Overhead • There is a 20 % increase in run-time overhead in collecting the unified trace.
Conclusions • We have designed an extended control flow trace that captures both control flow and data dependence history. • The key to the unified trace is the ability to convert memory data dependences into control flow. • The resulting unified trace is smaller than the combined control flow + dependence trace. • The run-time overhead increases by 20 %. Our Thanks to Hoi Vo of Microsoft Corporation and the Phoenix Compiler Infrastructure Group.