Presenter: Shao -Jay Hou
This paper presents an innovative and efficient approach for capturing and compressing program execution traces in real-time utilizing a double move-to-front method. We discuss its performance benefits, highlighting a cost-effective hardware implementation that achieves only 0.12 bits per instruction in trace port bandwidth while maintaining a gate count of just 25K. Traditional tracing methods are often invasive or impractical, making our solution particularly valuable for modern SoC complexities. Our experimental benchmarks demonstrate significant compression ratios, ranging from 82.7:1 to 29389:1, showcasing the effectiveness of this method.
Presenter: Shao -Jay Hou
E N D
Presentation Transcript
A Real-Time Program Trace CompressorUtilizing Double Move-to-Front MethodVladimir Uzelac, AleksandarMilenkovicThe University of Alabama in Huntsville DAC’09, July 26-31, 2009, San Francisco, California, USA Presenter: Shao-Jay Hou
Abstract • This paper introduces a new unobtrusive and cost-effective method for the capture and compression of program execution traces in real-time, which is based on a double move-to-front transformation. We explore its effectiveness and describe a cost-effective hardware implementation. The proposed trace compressor requires only 0.12 bits per instruction of trace port bandwidth, at the cost of 25K gates.
What’s the problem? • Continual growth in the complexity of SoC makes traditional approaches infeasible or impractical. • In-Circuit-Emulator (ICE) • Invasively ,have to stop the processor • Software approach • Use breakpoint • Software step-by-step debug waste too much time. • There are another tracing system, but the cost or the area is to high. • C.F.Kao’s LZ-based program trace.
My reaserch tree My thesis(ideal) The improve of tracer. The application of tracer. An Embedded Infrastructure of Debug and Trace Interface for the DSP Platform A Real-Time Program Trace Compressor Utilizing Double Move-to-Front Method
Related work This paper The program trace system for each embedded system processor Another program trace system and compression techniques For the experiment benchmark ARM ETM[1] AlteraNios II[2] Xilinx Microblaze[3] Lauterbach[4] [5-7] Mibench[8] Hardware architecture The basic MTF method and bzip compression To collect the symbols in system MTF[10] Bzip2[11] SimpleScalar[9] CAM[12]
Program Characteristics • For each instruction, there are two characteristics: • SA(staring address) • SL(stream length) • PC(program counter) • If the current PC is differ form the previous PC than the instruction length, the current instruction is the beginning of a new stream. • The unconditional direct branch we do not terminate the current stream because the address of the next instruction in sequence can be inferred directly from the binary.
Move-To-Front method • Some parameters in this method: • Ht(history table) • Input • Output • A easy example to explain it: • If the history table ht=[C,B,A] • The input is {AABC} • The output should be 2022
DMTF Method • Use two-level history table • Mtf1 • Mtf2 • The flow of DMTF:
DMTF Method(cont.) • The output bits of DMTF: • Mtf2.zhr=>mtf2 zero entry hit rate • Mtf2.ohr=>mtf2 non-zero entry hit rate • Mtf1.hr=>mtf1 hit rate • Mtf1.size=>mtf1 size • Mtf2. size=>mtf2 size
DMTF Method(cont.) • Am example of DMTF
DMTF Method(cont.) • The result of DMTF
Enhanced DMTF Method • Last-Value Predictor for Upper Address Bits: • Upper bits of SA in stream is rarely change during program execution • Use SA[31:20] as HLV • Zero Hit Trace Counters: • Mtf2 zero entry hit happen often • Use a counter to count and dynamically adjust the size of ZLC • Use a head bit “0”
Enhanced DMTF Method(cont.) • The block diagram of enhance DMTF trace format:
Enhanced DMTF Method(cont.) • bDMTF = basic DMTF • hDMTF = DMTF with HLV • eDMTF = enhance DMTF
DMTF Hardware Implemention • content addressable memory (CAM) • most-recently used (MRU) stack • the gates count of DMTF(192,4) is less then 24600
Conclusion • The paper present a double move-to-front method, and the compression ratio is between 82.7:1~29389:1(average is 268:1) and the bandwidth is 0.001 to 0.39 bits/instruction (average is 0.12) • And the hardware is area-save • Compare to C.F.Kao[5], the area is half.
My Conclusions • The paper give a good method in compression. • And the paper use many example to show the method how to run. • But the paper didn’t talk too much on how to experiment, and how to get the data.