Cache Storage for the Next Billion: Accessible Technology Solutions

Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

The Next Billion • Developing regions are not all alike • Many people have stable food, clean water, reasonable power • Connectivity, however, is bad • Growing middle class with desire for education & technology • These people are the next billion Cache Storage for the Next Billion

Bad Networking & Options • Africa often backhauled through Europe • Satellite latency not fun • Ghana: 2Mbps, $6000/month! • Emerging option: disk • 1TB disk now $200 • Even latency better than satellite Cache Storage for the Next Billion

Enter the Tiny Laptops • Problem – memory in 256MB range Cache Storage for the Next Billion

Making Storage Work • Populate disk with content • Preloaded HTTP cache • Preloaded WAN accelerator cache • Preloaded Web sites – Wikipedia, etc • Ship disk to schools • Update as needed • Pull update caches on-demand during peak • Push updates off peak, overnight Cache Storage for the Next Billion

Deployment Scenarios • Special servers per school • 2 for redundancy • Average school size: 100 students • @ 100/laptop, $10K/school • Problems • 2 servers @ $5K doubles per-school cost • Servers don’t ride laptop commodity curves • Solution: no servers, just laptops Cache Storage for the Next Billion

Goal: 1 TB Cache Store on a 256MB Laptop • Why caching? • Improves Web access • Improves WAN access • Problem • Large disks are really slow • Disk storage requires index • In-memory indices optimize disk access Cache Storage for the Next Billion

Memory Index Sizing • Squid: popular HTTP cache • 72 bytes/object • Web objects average 8KB each • 1TB = 125M objects • 125M objects = 9GB RAM just for index • Commercial caches: better RAM usage • 32 bytes/object • 1TB disk = 4GB RAM Cache Storage for the Next Billion

Revisiting Cache Indexing • Seek reduction important • Most objects small • Access largely random • High insert rate • Assume hit rate is 50% • Assume cachable rate is 50% • Insert rate = 25% of request rate • High delete rate • Caches largely full • If insert rate = 25%, delete rate = 25% • Deletion using LRU, etc Cache Storage for the Next Billion

Restarting the Design • Eliminate in-memory index • Treat disk like memory • Optimize data structures for locality • Use location-sensitive algorithms • Measure performance • Now consider what to add • For each addition, measure performance Cache Storage for the Next Billion

What This Yields • HashCache family • One basic storage engine • Pluggable algorithms & indexing • HashCache proxy • Web proxy using HashCache engine Cache Storage for the Next Billion

Performance Comparison Cache Storage for the Next Billion

Index Bits Per Object 240 576 Cache Storage for the Next Billion

Index Bits Per Object 39 240 31 11 576 0 0 Cache Storage for the Next Billion

HashCache Memory Cache Storage for the Next Billion

Storage Limits w/2GB Index Cache Storage for the Next Billion

Beyond Diminishing Returns • HTTP cachability has upper limit • Beyond that, items revalidated helps • Revalidation on demand, or background • Uncached content still cachable • Wide-area accelerators • Must still contact servers, though Cache Storage for the Next Billion

Why WAN Acceleration? • Lots of slowly-changing data • Wikipedia • News sites • “Customized” sites • WAN acceleration middleboxes • Custom protocol between boxes • Standard protocols to rest of net • Less desirable than caches for Web Cache Storage for the Next Billion

WAN Acceleration Dilemma • WAN accelerators use chunks • Transit stream broken into chunks • Small chunks = high compression • Also lots of small objects • Large chunks = high performance • But worse for compression • Memory & disk important Cache Storage for the Next Billion

Merging WAN Acc & HashCache • Easily index huge # chunks • Small chunks OK • Large chunks better • Store chunks redundantly • Optimize for performance & compression • Communicate tradeoffs to cache layer Cache Storage for the Next Billion

Deployments • Two cache instances deployed • Both in Africa • Shared machines, multiple services • Working with OLPC on deployment • Working on licensing • Hopefully resolved this year • Goal: all-in-one server for schools Cache Storage for the Next Billion

Longer Term Goals • Effort started around server consolidation • Virtualization nice, except for memory • Many apps very page-fault sensitive • Extracting & sharing components desirable • More work in developing regions • Even within the US: poor, rural, etc • Customization for school-like workloads • More work on peak/off-peak behavior Cache Storage for the Next Billion

Cache Storage for the Next Billion: Accessible Technology Solutions