1 / 26

Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction

Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction. Xianfeng Li Tulika Mitra Abhik Roychoudhury National University of Singapore. Why Timing Analysis?. Timing guarantees for real time embedded system Real time scheduling:

tamra
Télécharger la présentation

Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accurate Timing Analysisby Modeling Caches, Speculation and their Interaction Xianfeng Li Tulika Mitra Abhik Roychoudhury National University of Singapore

  2. Why Timing Analysis? • Timing guarantees for real time embedded system • Real time scheduling: • Worst case bound on execution time • Tasks are guaranteed to be schedulable irrespective of inputs • Tight bound to avoid idle processor cycles • Extremely important for safety critical systems

  3. Worst Case Execution Time (WCET) • Maximum execution time of a program on a micro-architecture for all possible inputs • Measurement • Execute program for all inputs: impractical • Execute program for selected inputs to get a lower bound on WCET (Observed WCET) • Analysis • Employ static analysis to compute an upper bound on WCET (Estimated WCET) Estimated Actual Observed

  4. WCET Analysis • Program path analysis [Shaw’89, Healy’98,..] • All possible paths in program are not feasible • Micro-architectural modeling • Dynamically variable instruction execution time • Cache, Pipeline [Li’99, Theiling’00, Schneider’99,..] • Speculative execution (branch prediction) [Mitra’02] • Combined modeling of cache + speculative execution

  5. Speculative Execution • No Speculative Execution • Misprediction • Correct prediction B N T S Misprediction penalty

  6. Cache + Speculation: Destructive Effect Cache Execution B N T S Cache Miss 1: Loading into cache from speculated path & N T map to same cache block Cache Miss 2: Loading into cache from correct path

  7. Cache miss penalty (CMP) along speculative path Fully masked by branch misprediction penalty (BMP) Partially masked by BMP wait for cache miss to be serviced before executing correct path Cache miss penalty along correct path due to fetch along speculative path Destructive Effect: Extra Cache Misses BMP BMP CMP CMP

  8. Cache + Speculation: Constructive Effect Cache Execution B N S Cache Miss 1: Loading into cache from speculated path & B S map to same cache block Cache Hit: Correct block already loaded into cache

  9. How serious is the effect?

  10. Technique: Integer Linear Programming • Integrate program analysis and micro-architectural modeling in an ILP framework [Li and Malik 1995] • Input: • Control Flow Graph (CFG) of the program • User provided loop bounds, recursion depth etc. • Specification of micro-architecture • Objective function: Execution time (maximized) • Constraints • Flow constraints from Control Flow Graph • Constraints from micro-architectural modeling • ILP formulation of instruction cache + speculative exec.

  11. Objective Function WCET =  (costB × countB + BMP x mispredictionB + CMP x missB + mp_delayB) • costB × countB: Execution time of basic block B without cache miss and branch misprediction • BMP x mispredictionB:Penalty due to mispredictions • CMP x missB: Penalty due to cache misses • Includes constructive and destructive effect of speculation along correct path • mp_delayB: Penalty due to partially masked cache misses along speculative path (variable CMP)

  12. Flow Constraints: Easy !! • es,1 +e3,1 = count1 = e1,2 + e1,4 • e1,2 + e2,2 = count2 = e2,3 + e2,2 • e2,3 + e4_3 = count3 = e3,1 + e3,E • e1_4 = count4 = e4,3 • Loop bounds: e2,2  100 e3,1 10 Bounds countB Inflow = Basic Block Execution Count = Outflow Bound on maximum loop iterations B1 B2 B4 B3

  13. Other Constraints • Branch misprediction constraints • Bounds mispredictionsB • Details appeared in an earlier paper • Timing Analysis of Embedded Software for Speculative Processors. T. Mitra, A. Roychoudhury and X. Li. In ACM Intl. Symposium on System Synthesis (ISSS) 2002 • Instruction cache miss constraints • Bounds missB[Li, Malik and Wolfe 1999]

  14. Modeling Cache-Speculation Interaction • Modify instruction cache miss constraints to model constructive/destructive effect of speculation along correct path • Add additional constraints on mp_delayB: Penalty due to partially masked cache misses along speculative path

  15. Modeling Instruction Cache S B1 pS_1 p1_3 B1 B3 B2 B4 p3_E p3_1 E B3 Cache Conflict Graph Flow among blocks mapping to the same cache line pS_1 + p3_1 = count1 = p1_3 miss1 = pS_1 + p3_1

  16. Constructive Effect of Speculation B1 Miss T N B1 B3 T B2 B4 Miss N T B3 (2,T) B3 Partially Masked CMP N Speculative Path Correct Path

  17. Constructive Effect of Speculation B1 Miss T N B1 B3 T B2 B4 Miss Hit N T B3 (2,T) B3 Partially Masked CMP N Speculative Path Correct Path miss3 will decrease by the amount of flow between B3 (2,T) and B3

  18. Destructive Effect of Speculation B1 T N B2 B4 T B2 B4 Hit Miss N T B4 (1,N) B3 Partially Masked CMP Miss N Speculative Path Correct Path miss2 will increase by the amount of flow between B4 (1,N) and B2

  19. General Flow Involving Extra Nodes b b n X X X X Case 1 m (b,X) b1 m (b,X) n1 Case 2 Case 4 Y Y Case 3 Case 2 m2 (b1,Y) m1 (b,X)

  20. Additional Constraints b X X B1 B2 CMP > BMP BMP Bn i-1 count (mi(b,X)) = misprediction(b, X) -  miss(mk(b,X)) k=1 n mp_delay (b, X) =  miss(mk(b,X))×delay (mk(b,X)) k=1 i-1 delay (mi(b,X)) = CMP – (BMP -  cost (mk(b, X)) k=1 And some others ….

  21. Benchmarks

  22. Experimental Methodology • Observed WCET: simulation • SimpleScalar cycle-accurate architectural simulator • In-order exec, No pipeline, No Data Cache misses • Branch misprediction penalty = 5 cycles • Cache miss penalty = 10 cycles • Estimated WCET: Prototype analyzer • Input: benchmark in assembly code, -arch parameters, loop bounds • Output: ILP constraints • Feed the constraints to CPLEX: a commercial ILP solver

  23. Accuracy (Smaller Benchmarks)

  24. Accuracy (Larger Benchmarks)

  25. Scalability

  26. Summary • Micro-architectural modeling is crucial for tight estimation of Worst Case Execution Time (WCET) • Existing methods typically focus on a single micro-architectural feature • Cache • Pipeline • Speculation • A step towards combining micro-architectural features which effect each other • Cache misses/hits due to speculation

More Related