130 likes | 240 Vues
This study evaluates the realism of achieving non-stop web availability, aiming for 24/7 operations. It discusses traditional uptime metrics, like "nines," and emphasizes the necessity of measuring actual end-user experiences rather than just server uptime. The research involved real transactions over six months across various sites, uncovering that the majority of errors originate locally, particularly hardware issues and human errors. It highlights the impact on user satisfaction, providing insights into the dynamics of web availability, response times, and error management strategies.
E N D
Measuring End-User Availability on the Web: Practical Experience Matthew Merzbacher (visiting research scientist) Dan Patterson (undergraduate) Recovery-Oriented Computing (ROC) University of California, Berkeley http://roc.cs.berkeley.edu
E—Commerce Goal • Non-stop Availability • 24 hours/day • 365 days/year • How realistic is this goal? • How do we measure availability? • To evaluate competing systems • To see how close we are to optimum
The State of the World • Uptime measured in “nines” • Four nines == 99.99% uptime (just under an hour downtime per year) • Does not include scheduled downtime • Manufacturers advertise six nines • Under 30s unscheduled downtime/year • May be true in perfect world • Not true in practice on real Internet
Measuring Availability • Measuring “nines” of uptime is not sufficient • Reflects unrealistic operating conditions • Must capture end-user’s experience • Server + Network + Client • Client Machine and Client Software
Existing Systems • Topaz, Porvio, SiteAngel • Measure response time, not availability • Monitor service-level agreements • NetCraft • Measures availability, not performance or end-user experience • We measured end-user experience and located common problems
Experiment • “Hourly” small web transactions • From two relatively proximate sites • (Mills CS, Berkeley CS) • To a variety of sites, including • Internet Retailer (US and international) • Search Engine • Directory Service (US and international) • Ran for 6+ months
Types of Errors Network: Medium (11%) Severe (4%) Server (2%) Corporate (1%) Local (82%)
Client Hardware Problems Dominate User Experience • System-wide crashes • Administration errors • Power outages • And many many more… • Many, if not most, caused or aggravated by human error
Does Retry Help? Green > 80% Red < 50%
What Guides Retry? • Uniqueness of data • Importance of data to user • Loyalty of user to site • Transience of information • And more…
Conclusion • Experiment modeled user experience • Vast majority (81%) of errors were on the local end • Almost all errors were in the “last mile” of service • Retry doesn’t help for local errors • User may be aware of the problem and therefore less frustrated by it