1 / 43

Got Predictability? Experiences with FT Middleware

Explore the predictability of fault-tolerant middleware in IT infrastructures, including service-level agreements, problem determination, and self-management. Learn about empirical data collected and the unpredictability of faults.

Télécharger la présentation

Got Predictability? Experiences with FT Middleware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Got Predictability?Experiences with FT Middleware Tudor DumitraşPriya Narasimhan Carnegie Mellon University

  2. Who Needs Predictability? • Service-level agreements • Problem determination, fingerpointing • Self-management, autonomic computing FT-middleware protects the critical parts of IT infrastructures • Higher predictability requirements

  3. Predictability of Fault-Tolerant Middleware • Faults are inherently unpredictable • What about the fault-free case? • Reportedly, max (response time) >> average (response time)

  4. Empirical Data Collected • MEAD Trace: micro-benchmark (client-server) • Middleware for Embedded Adaptive Dependability • Fault-Tolerant CORBA implementation • 1200 configurations • FTDS Trace: 7 macro-benchmarks (3-tier applications) • Developed during Fault-Tolerant Distributed Systems class • Enterprise applications: online gaming, e-commerce • Use CORBA or EJB • 336 configurations Available at: http://www.ece.cmu.edu/~tdumitra/FT_traces/

  5. Max Fault-Free Latency Fault-Free vs. Faulty Unpredictability (MEAD Trace) 2 Average Recovery Time 1.5 Recovery Time [s] 1 0.5 0 1 4 7 10 13 16 19 22 Number of Clients

  6. 13.6 s Max Fault-Free Latency Fault-Free vs. Faulty Unpredictability (FTDS Trace) 1.2 Fault Detection & Fail-over Fault Detection 1 Fail-over Request Processing 0.8 Recovery Time [s] 0.6 0.4 0.2 0 1 2 3 4 5 6 7 Project

  7. Outline • Can we predict the maximum latency of FT middleware? • When do high latencies occur and how high are they? • How common are the high latencies? • Do most requests have bounded latencies?

  8. Interface to application / CORBA(modified system calls) Tunability Tunable mechanisms Replicatedstate Replicationstyle #replicas Interface to Group Communication MEAD Architecture Srv The Replicator Replicated Server Replicated Client C Srv Cli R Group Communication C C R Client R Server Host OS CORBA CORBA Replicator Replicator Networking Host OS Host OS Networking Active Replication: all replicas process requests Passive Replication: primary replica processes requests

  9. Su-Duel-Ku Competitive Sudoku FTEX Electronic stock exchange Mafia Online game Ticket Center Online ticketing Blackjack Online casino eJBay Online auctioning Park’n Park Parking-lot management Applications from the FTDS Trace EJB CORBA Passive Replication Active Replication

  10. Architecture of FTDS Applications

  11. Outline • Can we predict the maximum latency of FT middleware? • When do high latencies occur and how high are they? • How common are the high latencies? • Do most requests have bounded latencies?

  12. 4 x 10 4 -4 x 10 x 10 1.8 8 2 1.8 1.6 7 1.6 1.4 6 1.4 1.2 5 1.2 1 Latency [μs] Latency [μs] PDF 4 1 0.8 0.8 3 0.6 0.6 2 0.4 0.4 1 0.2 0.2 0 5 10 15 20 25 30 35 0 0 0 0.5 1 1.5 2 Time [s] Latency [μs] 4 x 10 Example of Unpredictability Maximum latency can be orders of magnitude larger than the average

  13. 7 10 6 10 Average latency [μs] 5 10 4 10 3 10 65536 5000 4096 4000 256 3000 2000 16 Request size [bytes] 1000 0 Request rate [req/s] Unpredictability in the MEAD Trace

  14. 7 10 7 10 6 10 6 10 Average latency [μs] 5 10 5 10 4 10 4 10 3 10 3 10 65536 65536 5000 4096 4000 5000 4096 256 3000 4000 2000 16 3000 256 Request size [bytes] 1000 2000 16 0 1000 Request rate [req/s] Request size [bytes] Request rate [req/s] 0 Unpredictability in the MEAD Trace Maximum latency [μs]

  15. 2 4 10 3.5 3 2.5 2 Maximum latency [s] 1.5 SuDuelKu 1 FTEX Park’n Park 0.5 Ticket Center 0 0 1 2 3 4 Average latency [s] Average and Maximum Latency MEAD 1 10 0 10 Maximum latency [s] -1 10 MEAD -2 10 -3 10 -2 0 2 10 10 10 Average latency [s]

  16. Outline • Can we predict the maximum latency of FT middleware? • When do high latencies occur and how high are they? • How common are the high latencies? • Do most requests have bounded latencies?

  17. -4 4 x 10 x 10 8 2 1.8 7 1.6 6 1.4 5 1.2 4 1 PDF Latency [μs] 0.8 3 0.6 2 0.4 1 0.2 0 0 0 0.5 1 1.5 2 Latency [μs] 4 x 10 Statistical Analysis of Unpredictability

  18. 300 200 Maximum z-score 100 0 Correlation with Message Size (MEAD) 1.5% 1% Percentage of outliers 0.5% 0% 16 256 4096 16384 65536 Size of reply messages [bytes]

  19. Time in Kernel and User Mode (MEAD) • 25% kernel mode 16 KB and 64 KB • 10% kernel mode 16 B, 256 B and 4 KB

  20. 150 100 Maximum z-score 50 0 Number and Size of Outliers (FTDS) 3% 2% Percentage of outliers 1% 0% SuDuelKu FTEX eJBay Mafia Ticket Center Blackjack Park’n Park FTDS Project

  21. 60 50 40 30 Maximum z-score 20 10 0 Correlation with Number of Clients (FTDS) SuDuelKu 6% 5% 4% 3% Percentage of outliers 2% 1% 0% 1 4 7 10 Clients

  22. 60 50 40 30 Maximum z-score 20 10 0 Correlation with Request Rate (FTDS) FTEX 6% 5% 4% 3% Percentage of outliers 2% 1% 0% 5 10 15 20 25 Request rate [req/s]

  23. Outline • Can we predict the maximum latency of FT middleware? • When do high latencies occur and how high are they? • How common are the high latencies? • Do most requests have bounded latencies?

  24. Outlier Distribution (MEAD) 1200 1000 800 600 Experiments 400 200 0 0% 1% 2% 3% 4% 5% 6% Outliers per Experiment

  25. Outlier Distribution (Comparison) 1 Ticket Center eJBay 0.8 Park’n Park 0.6 Blackjack FTEX Mafia Probability Density 0.4 0.2 SuDuelKu 0 0% 1% 2% 3% 4% 5% 6% Outliers per Experiment

  26. Isolating the Unpredictability (MEAD)

  27. Isolating the Unpredictability (MEAD) The “haircut” effect of removing 1% of the highest latencies

  28. The Magical 1% Unpredictability seems to be confined to 1% of the remote invocations.

  29. 0.4 2 6 Park’n Park 1.5 4 SuDuelKu MEAD 0.3 Blackjack Mafia 1.5 4 3 0.2 1 Latency [s] 1 Latency [s] 15 2 0.1 2 FTEX 0.5 0.5 1 0 10 10 20 30 40 0 0 Latency [s] 200 10 400 20 600 30 800 1000 40 1200 0 0 10 10 20 20 30 30 40 40 Experiment 5 0 10 20 30 40 Experiment th 99 percentile Average latency Maximum latency Magical 1%

  30. Outline • Can we predict the maximum latency of FT middleware? • When do high latencies occur and how high are they? • How common are the high latencies? • Do most requests have bounded latencies?

  31. [ ] [ ] th 99 percentiles [ ] [ ] Confidence interval [ ] [ ] [ ] [ ] Bounds for the 99th Percentile MEAD Latency range Ticket Center Park’n Park Mafia eJBay FTEX Blackjack SuDuelKu 0 40 80 120 160 200 240 Z-Scores of Latency

  32. 7 10 6 10 s] m 5 10 99% latency [ 4 10 3 10 65536 16384 5000 4096 4000 256 3000 2000 16 1000 0 Request size [bytes] Request rate [req/s] Trends for the 99th Percentile (MEAD)

  33. Summary • Can we predict the maximum latency of FT middleware? • Not always; maximum usually not correlated with average • When do high latencies occur and how high are they? • Usually not correlated with configuration parameters, OS metrics • Comparable with recovery time after crash faults • How common are the high latencies? • Confined to 1% of remote invocations • Do most requests have bounded latencies? • 99% of requests have a z-score < 10

  34. Implications of the Magical 1% • Predictable maximum latencies are hard to achieve • Cannot eliminate high latencies by carefully configuring the system • Statistical predictability is easy to achieve • 99th percentile latency bounded with high confidence • Confirmed for different • Applications • Programming languages • Middleware technologies • Replication mechanisms • Operating systems • Not confirmed for WANs, wireless networks • Statistical predictability is relevant for many enterprise applications

  35. Thank You! For more information: http://www.ece.cmu.edu/~tdumitra

  36. MEAD Trace vs. FTDS Trace

  37. MEAD Test bed Emulab 100 Mb/s LAN Pentium III at 850 MHz Parameters varied Replication style: active, passive Replication degree: 1, 2, 3 replicas Number of clients: 1 – 22 Think time: 0, 0.5, 2, 8, 32 ms Reply size:16 B, 256 B, 4 KB, 16 KB, 64 KB FTDS Test bed Undergraduate cluster 100 Mb/s LAN Pentium IV at 2.4 GHz Parameters varied Clients: 1, 4, 7, 10 Think time: 0, 20, 40 ms Reply size: original, 256 B, 512 B, 1 KB Experimental Setup

  38. Client Server Application out in client server out in ORB in out interc_hi in out Replicator interc_lo out in out in reply request Group Communication Sources of Unpredictability

  39. State State Request Response State Transfer Passive Replication Passively Replicated Server Object Passively Replicated Client Object Primary Replica Primary Replica ORB ORB ORB ORB ORB Client Group Server Group

  40. Duplicate Invocation Suppressed Duplicate Responses Suppressed Active Replication Actively Replicated Server Actively Replicated Client ORB ORB ORB ORB ORB Client Group Server Group

  41. 400 Maximum z-score 200 0 Correlation with Number of Clients (MEAD) 1% Percentage of outliers 0.5% 0% 1 4 7 10 13 16 19 22 Number of clients

  42. Minor Page Faults (MEAD)

  43. Outlier Distribution (MEAD) 1200 1000 800 600 Experiments 400 200 0 0% 1% 2% 3% 4% 5% 6% Outliers per Experiment

More Related