1 / 13

Fingerprinting the Datacenter

Fingerprinting the Datacenter. Marcel Flores Shih-Chi Chen. Motivation. Large datacenters often encounter large and complex crises Come in the form of dipping below SLAs Often complex and difficult to diagnose Can be costly to operators. Approach.

juan
Télécharger la présentation

Fingerprinting the Datacenter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fingerprinting the Datacenter • Marcel Flores • Shih-Chi Chen

  2. Motivation • Large datacenters often encounter large and complex crises • Come in the form of dipping below SLAs • Often complex and difficult to diagnose • Can be costly to operators

  3. Approach • Want to quantify the state of the datacenter in a compact manner • Can be compared to past crises • Allows for easy identification and diagnoses of crises

  4. Fingerprints • Tracks quantiles for each metric • Determines hot/normal/cold status for each metric • Includes only relevant metrics • Uses a similarity metric for comparison

  5. Fingerprint - details • Track quantiles of each metric • Resistant to outliers • Measure 25%, 50%, 95% quantiles • Determines if each measurement is Hot (>98th percentile), Cold (<2nd percentile), or Normal

  6. Relevant Metrics • Select metrics via feature selection and classification • Technique from statistical machine learning • Eliminates noise from the fingerprints

  7. Identification • Define a similarity metric • Allows comparison between current state fingerprint and known crisis fingerprints • Identification Threshold determines when two fingerprints are considered the same

  8. Evaluation • Used data gathered from a real live data center consisting of hundreds of servers • 240 days • About 100 metrics per server

  9. Evaluation Criteria • Discrimination: when are two crises different? • Identification Stability: when does it provide a consistent suggestion? • Identification Accuracy: when does it provide the correct label?

  10. Offline • Uses all known data • Attempts to recall the crises that it saw • Provides a baseline • What is the best possible (if it knew everything)? • Dominates existing methods, near perfect.

  11. Quasi-Online • More realistic, but still computes the thresholds offline • Doesn’t know the future • Known and Unknown accuracy of 85%

  12. Online • Everything online, computed on the fly • Including Identification Threshold • Achieved both accuracies to 80% (with 10 seeding crises) • 78% known, 74% unknown (with 2) • Does well with smaller seeding set!

  13. A note on Thresholds • Hot/Cold thresholds were selected arbitrarily • Ran evaluations with varied values from other statistical methods • Showed reduced discriminative power (95% down from 99%) • Why mess with what works?

More Related