Automating Misconfiguration Troubleshooting with PeerPressure: Enhancing PC Tech Support
190 likes | 324 Vues
This presentation explores the significance of a novel approach, PeerPressure, in automating the troubleshooting of misconfigurations in desktop PCs, which account for a sizeable portion of technical support costs. The authors present an architecture and algorithm that leverage Bayesian statistics to identify misconfigurations efficiently. By utilizing a "Golden State" concept, PeerPressure analyzes shared configuration settings to pinpoint anomalies. The prototype demonstrates effective performance and highlights future work on enhancing troubleshooting capabilities for complex application environments.
Automating Misconfiguration Troubleshooting with PeerPressure: Enhancing PC Tech Support
E N D
Presentation Transcript
Automatic Misconfiguration Troubleshooting with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research Presenter: Sara Salahi Northwestern University
Agenda • Importance of this work • Key ideas • PeerPressure: Architecture & Algorithm • Prototype • Performance • Future Work
Authors focus on this Importance • Tech support = 17% total cost of ownership of today’s desktop PCs • Large amount of Tech support is spent on troubleshooting • Many troubleshooting cases are due to misconfiguration • Misconfiguration is often caused by data that is in shared persistent stores (e.g. Windows registry)
Key Ideas: Misconfigurations • Can have many different “root causes” • Seemingly innocuous changes to shared system configurations • System bugs • Security patches may introduce incompatible registry settings • Failed uninstallation of applications • Manual intervention using Registry editor
Key Ideas: The Golden State • “Golden State” – a perfect configuration • Assume that the golden state is in the mass • Combine statistical golden state with Bayesian statistics to identify anomalous misconfigurations on “sick” machines
Key Ideas: Goals of Troubleshooting • Effectiveness • System should identify a small set of sick configuration candidates in a short amount of time • Automation • Minimize number of manual steps and number of users involved
3) Turns user- or machine-specific entries into canonicalized form 2) I found you 1) Sick computer 4) Database containing a number of machine configuration snapshots 5) Bayesian estimation used to calculate probability of a suspect being sick PeerPressure: Architecture
PeerPressure: Architecture • Manual Steps • User runs faulty application to record suspects • User determines if sickness is cured • Manual steps involve only the troubleshooting user and no second-party
PeerPressure: Algorithm • Intuition and Objectives • e1: Probably healthy • e2: Most probably sick • e3: “Natural biological diversity” • Type I: application configuration states • e1 and e2 • Type II: operational states (timestamps, caches etc) • e3 • Want to weed out; most likely false positives
PeerPressure: Algorithm Formulation: • (3) + (1) when m=0, P(S|V) = 1 • Bayesian estimation used to overcome this. • Vector pj: probability of event happening and its outcome being Vj; pj follows Direchtlet distribution. • mj: count of number of values matching suspect value
PeerPressure: Algorithm Asymptotic Analysis:
Prototype • GeneBank Database: Microsoft SQL Server 2000 containing snapshots from 87 Windows XP PCs • PeerPressure troubleshooter implemented in C# • “Data Sanitization” • Unification of different representations of the same value • Dual Intel Xeon 2.4 GHz CPU workstation with 1 Gb RAM hosts SQL Server
PerformanceResponse Time vs. Number of Suspects • 20 real-world troubleshooting cases used • Database queries dominate troubleshooting response time (one query per suspect entry)
Prototype: GeneBank • Registry characteristics in GeneBank • Unseen – values that are unknown to the GeneBank, increments observed cardinality by 1 • Any entry from GeneBank has cardinality of at least 2 • Entries that do no exist on some sample machines have value no entry • When cardinality is low, conformity among samples is strong
PerformanceRoot-Cause Ranking Results • 87% have cardinality of 2, 94% no more than 3, 97% no more than 4
PerformanceFalse Positives • Large cardinality of root-cause entry • Relation between root-cause entry and other entries in the suspect set • GeneBank is not pristine
PerformanceSick Machine Sensitivity Format: RootCauseRanking (NumberOfTies) / NumberOfSuspects
Future Work • Multi-gene troubleshooting • Multiple sick entries among suspects • Cross-application misconfiguration • Heavy customization of apps can break assumption of strong conformance in most configuration entries • GeneBank maintenance – privacy issue