Finding Common Ground: The Story of a DB and an OS

Finding Common Ground:The Story of a DB and an OS Remzi H. Arpaci-Dusseau www.cs.wisc.edu/~remzi

Databases and OS’s • Databases and OS’s: Long-time enemies • Why? [Stonebraker81] • Isn’t time to heal the wounds? • Dewitt - “No” • Naughton - “Probably not” • Well, at least we both care about storage! • So what’s happening in world of storage? • Networks!

Network-Attached Storage • Why bother? • Scalable Bandwidth • Highly Available • Simple/Reliable • Expandable • Specializable • What form will it take? • Disk vendors: Disks add CPU, network • Machine vendors: Specialized PCs CPU Network CPU CPU CPU

New World, New Problems • Goal: Build network storage system that... • Easy to manage • Easy to scale • Performs “well” • Implications • Plug-and-play: Add new disk, utilize tofull capacity, with no human intervention • BUT, new disk might be different than old ones (classic RAID algorithms don’t like that)

Disks: Complexity reigns • Additional problem: Complex disk drives • Multiple zones: Outer tracks > inner • Failure masking: Stop using bad blocks • Not fail-fast: Sputter then stop • Worse yet: Add complex networking! • Conclusion: In large collections of disks, can no longer expect predictable behavior • What’s a network storage system to do? • Be Adaptive!

Solution? WiND • Wisconsin Network Disks • Distributed systems technology meets storage • with Professor A. Arpaci-Dusseau, B. Forney, S. Muthukrishnan, F. Popovici, and J. Bent • Key Software Components • SToRM: On-line adaptive layout • Clouds: Cost-aware caching • GALE: Off-line selective replication • Core technology: Information architecture • Enables scalable adaptation across all layers

Outline • Motivation • WiND Overview • Adaptive layout • Caching • Long-term reorganization • Information architecture • Conclusions

Adaptive Layout • The Problem: Classic RAIDs don’t work! • Time(StripeWrite) = Max(T(D0), T(D1)…) • Parallel performance dependency • Always runs at rate of slow disk Raid 0(Striping)

SToRM: Adaptive Layout • Example: Adaptive RAID-0 • Approach: Adjust layout per diskaccording to perceived rate of operation • Key: How to obtain information aboutperformance of remote disks? SToRM

SToRM Status • Basic prototype infrastructure in place • Linux kernel module on client • Simple storage server (replacement: NeST) • Runs on PC cluster with Gigabit Ethernet • Easy stuff works: • Classic RAID-0 is in place • Coming soon: Adaptive RAID-0 and more • Challenges: Meta-data minimization, proper server-side interface, support all RAID levels

Adaptive Caching w/ Clouds • Problem: Classic caching algorithms assume uniform replacement cost • Related work in theory, web, databases • How to apply to storage system? • Key: How to get cost information? $ Which block to replace?

Solution: Clouds • Clouds flexible caching infrastructure • Client-side • Server-side • Cooperative • Two lines of investigation • Not just LRU caches anymore • Take cost into account for replacement • Streaming I/O support • Can caches mask the performance of a slow disk?

Clouds: Status • Approach to problem: Simulation • Infrastructure up and running • Simple models used to confirm correctness • What’s next? • Compare basic caching algorithms withcost-aware counterparts • Add support for streaming workloads • Implement in cluster prototype

Off-line Adaptation • Problem: Adaptation can be short-sighted • Example • SToRM lays data out according to current rate • System characteristics (“climate”) change • Read performance suffers • Solution: Re-arrange data in background • Move data into new layout to match “climate” • Replicate data to add flexibility + reliability

Solution: GALE • Long-term optimization engine • Use simple rule-based system to enact off-line optimizations • Example: • If layout(X) does not match climate(Disks), and access(X) is frequent, and load(Disks) is low, replicate(X) • Key: Gathering climate & load information • Status: What you see on this slide...

Information Architecture • Problem: Key to all of WiND is efficient access to remote state information • Examples • How fast is that disk? • How much will it cost to get a remote block? • What’s needed? • Infrastructure for efficient collection of remote state information

Information Architecture • Taxonomy • Null: Don’t use information (e.g., RAID-0) • Parasitic: Sneak info into existing messages • Explicit: Add queries for remote state • Implicit: Infer via observation • Key: Hide details of information-gathering techniques from algorithms • IPIs (Information Programming Interfaces) • Choose dynamically among best options

Future Directions • Specialized File Systems • SA-NFS (Striped, Adaptive NFS) • RiverFS (Parallel FS for database query processing primitives) • Applications • Database query-processing primitives • Web cache and proxies • Standard NFS workloads

Conclusions • Storage systems -> Distributed systems • Need to treat them differently • Unpredictability will be the norm • Complex drives + networks -> Complex behaviors • Solution: WiND • Adaptation via information http://www.cs.wisc.edu/wind

Finding Common Ground: The Story of a DB and an OS