230 likes | 341 Vues
This paper explores how Thread Level Speculation (TLS) can be optimized for energy efficiency through intermediate checkpointing. The authors from the University of Manchester, University of Edinburgh, and Intel Labs Barcelona tackle the inherent energy costs associated with TLS due to re-execution from misspeculation. By proposing a novel checkpointing method guided by dependence prediction techniques, they demonstrate energy savings of up to 14%, averaging 7% over standard TLS execution, without adversely impacting performance. This research contributes significantly to improving speculative execution in multi-core processors.
E N D
Increasing the Energy Efficiency of TLS Systems Using Intermediate Checkpointing 1 University of Manchester 2 University of Edinburgh 3 Intel Labs Barcelona - UPC Salman Khan1, Nikolas Ioannou2, Polychronis Xekalakis3 and Marcelo Cintra2
Introduction • Power efficiency, complexity and time-to-market reasons lead to CMPs • Problem: • No benefits for sequential applications • Even for mostly parallel applications Amdahl’s Law limits performance gains with many cores • Solution: Thread Level Speculation(TLS) • But performance through TLS costs in energy Can we reduce the wastefulness of re-execution due to misspeculation without losing performance? HiPC 2011
Key Contributions • Propose checkpointing to improve efficiency of speculative execution • Evaluate dependence prediction techniques to guide checkpoint placement • Our approach results in an energy saving of up to 14%, with 7% on average over normal TLS execution, with no significant effect on speedup. HiPC 2011
Outline • Introduction • Checkpointing • Dependence Predictors • Checkpointing Policy • Experimental Setup and Results • Conclusions HiPC 2011
Thread Level Speculation HiPC 2011
Outline • Introduction • Checkpointing • Dependence Predictors • Checkpointing Policy • Experimental Setup and Results • Conclusions HiPC 2011
Placing Checkpoints • Stride • Dependence Prediction • Address based • Program Counter Based • Hybrid HiPC 2011
Dependence Prediction HiPC 2011
Hybrid Dependence Predictor HiPC 2011
Outline • Introduction • Checkpointing • Dependence Predictors • Checkpointing Policy • Experimental Setup and Results • Conclusions HiPC 2011
Placing Checkpoints • Limited number of checkpoints • Placing a checkpoint has a cost • Checkpointing on every positive prediction results in too many checkpoints HiPC 2011
Outline • Introduction • Checkpointing • Dependence Predictors • Checkpointing Policy • Experimental Setup and Results • Conclusions HPCA 2010
Setup • Simulator, Compiler and Benchmarks: • SESC (http://sesc.sourceforge.net/) • POSH (Liu et al. PPoPP ‘06) • Spec 2000 Int. • Architecture: • Four way CMP, 4-Issue cores • 16KB L1 Data (multi-versioned) and Instruction Caches • 1MB unified L2 Caches • Cycles from Violation to Kill/Restart: 12 • Cycles to Spawn: 12 HiPC 2011
Measuring Dependence Prediction HiPC 2011
Wasted Instructions: Unnecessarily squashed instructions. HiPC 2011
Outline • Introduction • Checkpointing • Dependence Predictors • Checkpointing Policy • Experimental Setup and Results • Conclusions HPCA 2010
Conclusions • Effective checkpointing improves the efficiency of TLS • Placing checkpoints by stride is not sufficient to reduce waste significantly • Checkpointing using dependence predication obtains energy saving of up to 14%, with 7% on average over normal TLS execution, with no significant effect on speedup. HiPC 2011
Read the paper for… • Complete results • Microarchitectural issues that arise from checkpointing running tasks • Modified squash/restart mechanism that is needed to avoid performance degradation from checkpointing HiPC 2011