420 likes | 599 Vues
Phase Analysis on Real Systems. Canturk ISCI Margaret MARTONOSI. Previously…. Runtime processor power monitoring and estimation Power Phase Behavior of programs ( Power Vectors ). Today!. Phase detection on real systems: Variability effects and potentials for repeatability
E N D
Phase Analysis on Real Systems Canturk ISCIMargaret MARTONOSI
Previously… • Runtime processor power monitoring and estimation • Power Phase Behavior of programs (Power Vectors) Canturk Isci - Margaret Martonosi
Today! • Phase detection on real systems: • Variability effects and potentials for repeatability • Virtual memory behavior – Tuning • Initial results • What’s going on? • BBVs – PMCs – PVs… and POWER • Simple metric prediction studies • Short term vs. long term MAJOR MINOR MAYBE Canturk Isci - Margaret Martonosi
Phase Detection with Power Vectors • Initial idea was to look at phase distributions of app-s and use some signature analysis to detect/predict phases • HOWEVER: • Multiple runs -inevitably- exhibit different real system behavior • The quantities & durations vary • The phase distributions vary Metric Var Time Var Canturk Isci - Margaret Martonosi
Variability Effects in Real System Behavior • A direct apples to apples comparison of phase signatures is not very relevant in real world! Canturk Isci - Margaret Martonosi
How do Phase Distributions Compare?Ex: 2 runs of gcc We Want We Get Canturk Isci - Margaret Martonosi
We Got Ourselves a Problem: • How do we extract this recurrent behavior information? • Speech/Humming recognition: • Stored libraries, signal stats • Pitch tracking • Image/Biomedical: • Image warping • Registration/Mutual information • Architects: • Simple to apply online • Implementable w/o massive state & combinationals Canturk Isci - Margaret Martonosi
Interesting Observation with Transitions • Trying to detect application from behavior • Upper Case: • Hit! • Lower Case: • False alarm? • Tracking phase transitions rather than phase sequences proves to be more useful in detecting recurrent behavior* Gcc1-Gcc2 Gcc-Equake Canturk Isci - Margaret Martonosi
Apply near-neighbor blurring Our Transition-Guided Detection Framework Benchmark run #1 Benchmark run #2 Sample PMCs to form 12D vectors Vector stream #1 Vector stream #2 The INTRO Identify Transitions Tinit #1 Tinit #2 Apply glitch/gradient filtering Tgg #1 Tgg #2 TggN #1 Apply cross correlation Match ⇒Peak at best alignment Mismatch ⇒ No observable peak Canturk Isci - Margaret Martonosi
GLITCHES: Initial Transitions: 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 Refined Trans-ns: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GRADIENTS: Initial Transitions: 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 Refined Trans-ns: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Sampling Effects: Glitches & Gradients • Nothing happens without disturbances Glitches • Glitch: Instability where before & after is same Spurious Transitions • Nothing happens instantaneously Gradients • Gradient: Instability where before & after is different A single true trans-n Canturk Isci - Margaret Martonosi
Glitch/Gradient Filtering • Very simple: no consecutive transitions • Leads to large reductions in transition count • We call these “Refined Transitions (Tgg)” Canturk Isci - Margaret Martonosi
Time Shifts • We have binary information We can do cheaper than shifted correlation coeff-s • Using Cross-Correlations show equally useful results • Easily implementable • Ex: Matching and Mismatch cases, and “The Peak” Gcc1-Gcc2 Gcc-Equake Canturk Isci - Margaret Martonosi
Time Dilations • Observation: Dilations exist as small jitters (few samples) • Proposed Solution: “Near-Neighbor Blurring” • Blur edges slightly Consider transitions as distributions around their actual locations • Tolerance: Spread of this distribution, [t-x, t+x] samples • Ex: Matching improvement with tolerance=4: run1 Mismatch! run2 run1 Match! run2 Canturk Isci - Margaret Martonosi
Apply near-neighbor blurring Our Transition-Guided Detection Framework Benchmark run #1 Benchmark run #2 Sample PMCs to form 12D vectors Vector stream #1 Vector stream #2 The SUMMARY Identify Transitions Tinit #1 Tinit #2 Apply glitch/gradient filtering Tgg #1 Tgg #2 TggN #1 Apply cross correlation Match ⇒Peak at best alignment Mismatch ⇒ No observable peak Canturk Isci - Margaret Martonosi
Results • How do we quantify the strength of the peak? • Matching Score: • Detection Results: (green: highest match; red: highest mismatch) Canturk Isci - Margaret Martonosi
Receiver Operating Characteristics • Our best detection scheme (tolerance=1) achieves 100% hit detection with <5% false alarms. • (For a uniform threshold!) Canturk Isci - Margaret Martonosi
Comparison of Methods • Comparing 3 cases: • Original (Value Based) Phases vs. Refined Trans-ns vs. Near-Nbr Blurred Trans-ns • In all cases transitions perform better • In almost all cases near-neighbor blurring improves detection Canturk Isci - Margaret Martonosi
Conclusions • Phase-recurrent behavior detection on real systems has interesting problems resulting from system induced variability • Looking at phase transition information in part improves detection capabilities • Supporting methods such as Glitch/Gradient Filtering and Near-Neighbor Blurring improve detectability of transition signatures Canturk Isci - Margaret Martonosi
Today! • Phase detection on real systems: • Variability effects and potentials for repeatability • Virtual memory behavior – Tuning • Initial results • What’s going on? • BBVs – PMCs – PVs… and POWER • Simple metric prediction studies • Short term vs. long term Canturk Isci - Margaret Martonosi
Workload Phases Memory Behavior? • Few of the Inspirations: • Redhat Magazine Issue #1 [Dec 2004] • Dynamically Tracking Page Miss Ratio Curve [ASPLOS 2005] • Gokul Kandiraju [PhD Thesis 2004] • Can we track phase behavior from PMCs and VM related stats to dynamically manage memory behavior? • Less page locality fetch less contiguous pages at once • Recurring reference with high reuse distance launder less aggressively • Targets • Exec time & Energy ?? ?? Indicator Action Effect Canturk Isci - Margaret Martonosi James Donald -
Platform • P4, No SMT, 256K Mem, Linux 2.4.7-10 • SPEC2K is designed to fit in 256K • Choose High Memory Benchmarks + Multiprogramming • Multiprogramming combinations of these leads to lots of thrashing Canturk Isci - Margaret Martonosi
Action Effect Indicator Action Effect • Non-intrusive tuning possibilities: • Kswapd:tries_base • Max # of pages swapout daemon tries to free at once • Kswapd:swap_cluster • # of pages swapout daemon writes at once • Page-cluster: • Log2(# of contiguous pages) kernel reads at once at a page fault • Intrusive tuning possibilities: • Page scanning period (Overhead if tasks fit in Mem) • Page age counters (reuse vs. pollution) • Inactive-Clean Percentage (balance I/O and Mem demand) • Task memory allocation (Workload dependent Mem demand) Canturk Isci - Margaret Martonosi James Donald -
Non-intrusive Results • Gzip: gzip + gzip + gzip • Gap: gap + gzip • Bzip2: bzip2 + bzip2 • Tries_base and swap_cluster have no visible effect • Page-cluster shows ~7% improvement wrt default Canturk Isci - Margaret Martonosi James Donald -
Conclusions and Todos • Multiprogramming involving thrashing has a lot of potential for improvement for performance/power • Experimented cases don’t show promising actions • Intrusive actions may be more useful leading to effective actions as well as better (per task) tracking • NEXT STEPS: • Looking into mm for potential dynamic tunings • Defining indicators tracking relevant behavior • Page miss ratio / Swap rates / Bus Utilization • Q: Is There any Potential? Canturk Isci - Margaret Martonosi James Donald -
Tomorrow! • Phase detection on real systems: • Variability effects and potentials for repeatability • Virtual memory behavior – Tuning • Initial results • What’s going on? • BBVs – PMCs – PVs… and POWER • Simple metric prediction studies • Short term vs. long term Canturk Isci - Margaret Martonosi
Comparing Phase Methods for Power • All lead to different interesting characterizations • How do these compare in terms of power representation? • Is there a dominant method or does a (hierarchical) combination work better? • We specifically look at BBVs & PMC-Power Vectors From Sampled PC Traces From Performance Monitoring Counters Canturk Isci - Margaret Martonosi
A Cache Size C M P Z Different Phases Ex: Dcache Microkernel • Specify L1 hit rate, generate ~desired hits via random linked list traversal Canturk Isci - Margaret Martonosi
Each hit rate range is obvious Trends NOT identical across metrics: Linear L1 misses vs. Nonlinear IPC FOR A SINGLE METRIC: How you capture phases depends on metric and chosen threshold Dcache Performance Traces Canturk Isci - Margaret Martonosi
No visible phases from PC samples Address Space Sampling alone is NOT sufficient!! Dcache PC Traces Canturk Isci - Margaret Martonosi
Experiment Setup • PIN kit 1795 • 3 level Trace instrumentation • ~Every user trace: Conditional inlined trace count • Every 50-200K Trace call: Sample EIP • Every 5-20M Trace call: • Generate BBV & Collect PMCs & Read PWR history • Constraint: Instrumentation should not overwhelm Power variations!! • BBV Generation: • Sample BBL heads hash into 32 dimensions (based on Jenkins) • PMC Reading: • Single rotation subset • Sample via ‘popen’s due to platform conflicts • Power Reading: • Read from serial device buffer • No polling possible disable device at major instrumentation & exhaust buffer Canturk Isci - Margaret Martonosi
BBV Results • Is sampling good enough? Are they Meaningful? B. Calder’s Full Blown BBV SimMatrices Our sampled & hashed BBV Simmatrices Canturk Isci - Margaret Martonosi
Power Results • Do we still have the hook on power variability? From PIN Native Native From PIN Canturk Isci - Margaret Martonosi
Currently… • Still need to verify benchmarks for power and validity • Constructing power vectors with the reduced set • Applying symmetric phase analyses to BBVs and PMCs • Power representation of phases wrt measurements • 90-10 Prediction with regression trees Canturk Isci - Margaret Martonosi
Today! • Phase detection on real systems: • Variability effects and potentials for repeatability • Virtual memory behavior – Tuning • Initial results • What’s going on? • BBVs – PMCs – PVs… and POWER • Simple metric prediction studies • Short term vs. long term Canturk Isci - Margaret Martonosi
Metric (IPC) Value Prediction • No big challenge to get good results, but improving for edges is interesting • Statistical Predictor:Transition guided, history based (EWMA) IPC Prediction • Instead of fixed history window, use stable regions between transitions as your history in a circular buffer • Transitions based on a threshold • Threshold = 0 • “Last Value Predictor” • Our experience: • Variabilities are bursty transitions • There are stable regions with probable gradients between transitions Canturk Isci - Margaret Martonosi
Ammp, thr=0% (Last Value) Canturk Isci - Margaret Martonosi
Ammp, thr=10% Canturk Isci - Margaret Martonosi
Using Stability Considerations (8) in IPC Pred-ns Canturk Isci - Margaret Martonosi
Predicting Durations • X=f(x) approach: • F(x) = x, x/2, x/8, … • Initial Stability requirement: 2,8,… • Table based? • Idea was: • At each transition: predict once for duration based on history: • Log(prev_duration) = key val-s [0,1,2,3,4,5] • History: • |5|3|5|3|5| 3 • |1|3|5|1|3| 5 • need to filter bursts somehow • Partial matchings?? • NOT EXPLORED!! Canturk Isci - Margaret Martonosi
Ammp Duration Prediction • Predict Based on F(x)=x/8 • Stability Criterion=8 samples • Extend duration stability continues • IPC based on last value • Predictions only at checkpoints Canturk Isci - Margaret Martonosi
Long Term IPC Prediction with Gradients • Last value not very useful at long term • Instead of 0 order, consider a 1st order prediction: • Need additional ΔIPC information • Next IPC = Current IPC + ΔIPC • Ex: F(x)=x/8 Canturk Isci - Margaret Martonosi
Improvements? • Using Prediction Probability Tables: • P{N more|20 stable @ IPC} • Ex: Vortex • Using adaptive functions based on history • Table based function approaches Canturk Isci - Margaret Martonosi