1 / 16

STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support (CCMS)

STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support (CCMS). Yi-Min Wang, Chad Verbowski, John Dunagan, Yu Chen, Helen J. Wang, Chun Yuan, & Zheng Zhang Microsoft Research, Redmond & Beijing. The Problem: Computer Fragility.

franklin
Télécharger la présentation

STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support (CCMS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support (CCMS) Yi-Min Wang, Chad Verbowski, John Dunagan, Yu Chen, Helen J. Wang, Chun Yuan, & Zheng Zhang Microsoft Research, Redmond & Beijing

  2. The Problem: Computer Fragility • “It worked yesterday, but not today.” • “It worked for that user, but not this user.” • “It worked on that machine, but not this machine.” • “I restarted the application, rebooted the machine, but still can’t fix the problem!” • We focus on Registry-related problems in this paper

  3. Scott and Susi’s Registry Problem

  4. PC: 200,000 Registry Values Human: 3 billion DNA base pairs Desktop Last Week Human #1 99% the same 99.9% the same Desktop Today Human #2 65% Similarity 70% - 90% Similarity >11% “Junk” Entries 50% “Junk” DNA Mouse Laptop 3 billion 200,000 < 5% Code for Config. changes < 2% Code for Proteins Inspired by the Human Genome Project Registry Entries for “Garbage fonts disease” Found at the Fontskey under HKLM\Software\Microsoft\ Windows NT\CurrentVersion Gene for Huntington's disease Found at the tip of the short arm of Chromosome 4

  5. Contributions of STRIDER • Strider Principles • Key to handling complexity in CCMS • Problem decomposition into 7 Strider components • Strider Process • Conceptual use of Strider components to solve particular CCMS problem • Strider Toolkit • Implementation of Strider components as command-line building blocks • Strider Troubleshooter • UI root-cause analysis tool that strings together command-line tools for troubleshooting

  6. Symptom- Based Analysis Knowledge, Experience, & Support database Imprecise, nondeterministic search Persistent Failure B Y Z C Mechanical & Statistical Latency Precise Database Lookup State- Based Analysis • “Is this a junk entry?” • “Who owns this entry?” • “Are there known problems with • this entry?” PC Genomics Database Principle #1: State-Based Analysis First-level decomposition: Mechanical, Statistical, & Database App or Action A State

  7. Freedom & Flexibility  Large install base  The Mess: Number of different configurations Grows with the number Of machines 200,000 WinXP Registry 77,000 Good Bad Diff Large install base  The Mass: Number of data points Grows with the number Of machines Diff Trace Intersection Diff System Restore Checkpoints Trace Bad Good Mechanical Principle #2: Attack The Mess With The Mass Second-level decomposition: Diff, Trace, & Intersection

  8. Principle #3: Complexity-Noise Filtering Self-filtering of complexity as noise • A lot of the differences are not significant for systems management and troubleshooting • Registry entries that are constantly changing are less important; they are simply “operational states” • Inverse Change Frequency (ICF) ranking • Registry entries that are always different on different machines constitute natural diversity among Windows machines • Start with deterministic bad state, end with deterministic bad behavior • Nondeterministic activities in-between are often less important • Intersection of multiple traces can filter out such noise

  9. Global state-snapshot repository Global cross-machine analysis for noise filtering & state ranking Local cross-time analysis for noise filtering & state ranking Intersection Diff Trace Good Mechanical Mechanical + Statistical 200,000 WinXP Registry 77,000 Good Bad Diff Diff Trace System Restore Checkpoints Bad

  10. Registry Change-Behavior Analysis • Four machines, each with 84 days of checkpoints • Percentage ever changed: 4.7% - 13.2% • Percentage operational: 1.9% - 5.6% • Percentage installation/configuration: 2.1% - 11.3% • Median # changes/day = 302 (raw), 29 (noise filtered)

  11. Strider Components • Mechanical • State Diff: diff “bad state” against “last known working state” • Tracing: failing app execution or booting • Intersection: diff & trace • Statistical • State Ranking: • Inverse Change Frequency (ICF) ranking: states with high change frequencies are less likely to be the root cause • Order ranking: states accessed later are more likely to be the result of execution divergence caused by the earlier root-cause entry • Database • PC Genomics Database: state functional & failure info 5.1. “Is this a junk entry?” – Noise Filtering 5.2. “Who owns this entry?” – Ownership Mapping 5.3. “Are there known problems with this entry?” – Support Database Lookup

  12. Support Articles Config Action UI App Info Doc Tracing State Diff Support Database Lookup Ownership Mapping PC Genomics Database Intersection Noise Filtering State Ranking Filtered & Ranked Candidate Set Strider Process for Troubleshooting Solution-query phase Narrow-down phase The program keeps failing It was working Now it doesn’t work User Tool

  13. After diff & trace intersection Average Registry size Two Orders Another Two Orders Of Magnitude After state diff Root cause Order-ranking After noise filtering Strider TroubleshooterCross-restore-point Results

  14. Average Registry size Root cause Order-ranking After noise filtering Cross-machine Results After diff & trace intersection Number of Registry Values After state diff

  15. Summary • Think outside the white-box • Derive “black-box manifests” through PC Genomics (tracing, diffing, & behavior modeling) and show their benefits for CCMS • White-box & black-box approaches complement each other • State+Symptom-based troubleshooting • State-based support articles can be retrieved by symptom-based search • Symptom-based search can be enhanced with additional state-based strings • Symptom-based matching can help state ranking

  16. Future Work • Long-term goal: develop new abstractions for systems management • Configuration Change Audits • “What has changed on my machine since last week, and who did it?” • Impact Analysis • “Is applying this patch going to break my apps?” • Server Drift • “What’s causing my server machines’ configurations to diverge?”

More Related