1 / 25

Systems Support for End-to-End Performance Management

Systems Support for End-to-End Performance Management. Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech. Complexity, complexity, complexity…. Source: Gartner (December 2005). Reasons for Complexity. Application diversity Interdependencies

Télécharger la présentation

Systems Support for End-to-End Performance Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech

  2. Complexity, complexity, complexity… Source: Gartner (December 2005)

  3. Reasons for Complexity • Application diversity • Interdependencies • Heterogeneous components • Too many different technologies and platform • Too little “hints” from the system to the administrators • Legacy issues; Application-specific solutions • Insufficient information about the system to drive self-management  Lack of Automation

  4. Online System Management Analyze Monitor Control Execute Workload • Scheduling • Capacity and SLA management • Design evaluation and tuning • Bottleneck detection • Resource provisioning, accounting, etc. Proposed Approach: Service Path

  5. Service Path Data Base Back - end Application Logic (EJBs, etc.) Middle-tier Servlet Server Front - end Web Servers Proxy Server I n t e r n e t • System abstractions that describe the dynamic dependencies between the different distributed application components • Service Class: Application-level request class, e.g. SLA class

  6. Service Path Characteristics • End-to-End analysis • Online • Non-intrusive • Application-generic

  7. Outline • Background • Motivation • Service path • Discovery with E2EProf • Refinement with SysProf • Automated SLA Enforcement • Related Work • Future Plans

  8. E2EProf D1 A C time B D2 X D time (AB) time (BC) time • Black-box approach • Correlate per-edge time series signals • Monitor network packet traces (source, destination, timestamps) Model traces as per-edge time series signals or density functions

  9. Basic Approach (AB) (BD) (AB) (BC) Delay at B • Compute cross-correlation (D1 D2) A C B X D SpikeCausality Spike’s position Delay No spike

  10. Evaluation with 4-tier RUBiS1 Tomcat Server 1 EJB Server 1 Clients I/O bound MySQL Server Apache Web Server comment bidding CPU bound Tomcat Server 2 EJB Server 2 1http://rubis.objectweb.org/

  11. Service Path Detection in RUBiS Round-robin load balancer Highest delay node Highest delay nodes Highest delay node Static server assignment

  12. Change detection in RUBiS Injected Delay

  13. Delta Air Lines’ Application Revenue Pipeline Total Traffic: 1.34 million / day (56k / hour) TACSIN & TACSOUT APEXIN & APEXOUT Error/Warning (Tivoli) Logs XIN & XOUT

  14. Delta Air Lines’ Application Client requests TACS Latency (sec) S1 S2 S3 S7 S8 Time of the day TACS Huge request burst

  15. Outline • Background • Motivation • Service path • Discovery with E2EProf • Refinement with SysProf • Automated SLA Enforcement • Related Work • Future Plans

  16. Beyond dependency and latency… S2 S6 C1 S4 S1 C2 S3 S5 • Solution: Zoom into the servicepath with SysProf • No application hints or instrumentation • Monitor resource usage on per-class basis

  17. SysProf Methodology • Track request context • Work done for processing a request class • May span user-level or kernel-level • Executes in more than one contexts (e.g. processes, threads, softirqs) • Happens in a system-visible event (e.g. system calls) system call parameters, PID, App functions A1 A2 AN User Kernel Scheduler System Call Scheduler Net softirq Network Stack FS/ VM/ etc. Context Switches Context Switches Disk I/O Init CID eth driver BDD From client Instrumentation points To client

  18. Class ID Propagation Process  CID Msg  CID Middle-Tier End-Tier Front-Tier User Kernel Init CID From client To client Packet  CID Inherits CID

  19. Application of SysProf • Resource Accounting • Utility Billing • Bottleneck detection • Capacity Estimation • Root-Cause Analysis • Black-Box SLA management

  20. Resource-Aware Adaptive Control Separate Queue/Controller for each cluster Tomcat Server 1 EJB Server 1 Controller + Scheduler MySQL Server Class 1 Class 2 Front-end Tomcat Server 2 EJB Server 2 Class 3 Cluster workloads contending for same resources

  21. Resource-Aware Adaptive Control Capacity = 80 req/s per server No SysProf With SysProf

  22. Summary • Service Path • System abstractions to represent dependencies and request path • E2EProf and Pathmap • Dependency and latency analysis • SysProf • Service-based resource analysis • Aid human operator and automate end-to-end performance management

  23. Thank You! Questions? Email: sandip@cc.gatech.edu

  24. Extra Slides

  25. Pathmap Optimizations time time Packet timestamp trace Bursty traffic Sliding window (W) W Run-length compression Time-series signal Or Density Function Upper-bound On latency time Cross-correlation series

More Related