170 likes | 172 Vues
Storage Research Meets The Grid. Remzi Arpaci-Dusseau. ADSL. Where Gray-box techniques meet storage systems. Gray Box. Storage. The Who, How, and What of ADSL. Who: Andrea and Remzi Arpaci-Dusseau And of course a bunch of students How: Gray-box Techniques Assume system is a “gray box”
E N D
Storage Research MeetsThe Grid Remzi Arpaci-Dusseau
ADSL • Where Gray-box techniques meet storage systems Gray Box Storage
The Who, How, and What of ADSL • Who: Andrea and Remzi Arpaci-Dusseau • And of course a bunch of students • How: Gray-box Techniques • Assume system is a “gray box” • Leverage knowledge of its implementation to: • Gain more information • Control its behavior • What: Storage Systems • Smarter disks and RAIDs
Semantically-smart Disks • Problem: Most disks don’t know much • Block-based SCSI interface limits knowledge • And what a waste of potential! • Modern RAIDs have substantial processing, memory • A semantically-smart disk system • Figure out how file system is using it • Exploits that to build new functionality into storage
Trend that Drives This Session: Data Demands on the Rise • Focus of original batch queueing systems: CPU • “cycle stealing” • Compute clusters • Distributed supercomputer • But data demands of jobs are on the rise… • Input, output, temp files and checkpoints • Modern science is increasingly data centric
Focus of this talk: Traditional storage vs. Grid storage • Most aspects of modern storage systemsare designed with certain domain in mind • Local area environment, presence of admin, etc. • Grid changes almost every assumption • Wide area, no admin, etc. • Conclusion: Must reexamine how to build storagesystems from the ground up
Outline • Introduction • Traditional vs. Grid Storage • Data reliability • Management • Caching and Overlap • Evaluation • Conclusions
Data Reliability: Traditional • All data treated equally, and is sacred • Most users tolerate some amount of data loss(30 second delay before flush to disk) • Losing one byte after flush is catastrophic • Strong implications for design:Backup + disaster recovery
Data Reliability: Grid • Different types of I/O, treat accordingly • Einstein’s Matter-Energy equivalence: E=MC^2 • Grid analogy: Data-Computation equivalence • E(M) = C • Knowledge is key: If you can refetch M, you can recompute C
Management: Traditional • Storage administrators control system • Performance tuning • Problem fixing • User handling • Human intelligence can be applied to makethings run smoothly
Management: Grid • No administrator to help out • Though may have to live within administrative limitations • System must automatically handle problems • Tune to environment • Deal with failures • Give reasonable feedback to usersupon errors and other problem scenarios
Buffering and Overlap: Traditional • Used throughout systems for performance • Important cache: Client-side • NFS: Memory • AFS: Disk (and memory) • Caches are managed transparently • Overlap: Disk->memory, across network, also transparent • Result: Operations can run as if they are local Client Server $ $
$ $ $ Buffering and Overlap: Grid • Used throughout for performance, reliability • Many more levels of cache • Not just clients/servers • Caches managed both transparentlyand not transparently • Overlap is more complex too(multiple users, resources) • Have to deal with more issues:failure, cost differentials WAN $ HomeSite
Evaluation: Traditional • Traditional storage metrics: Myopic focus • May miss “big picture” • One example: Availability • Defined as “uptime” of system • What’s good: “5 9s” of availability (up 99.999%) • Implications: • Systems are engineered for enterprise use(and thus over-engineered for many uses)
Evaluation: Grid • Grid metrics can focus on what’s importantfor Grid jobs: Job throughput • Instead of availability, measure impact of failureon the aspect of system that matters most • Result: An end-to-end perspective to evaluatemerit of new approaches in the Grid space
Summary • Grid changes storage systems • Makes some things harder(caching, overlap, failures) • Makes other things easier(better understanding of workload and metrics) • How to make it all work? • Exploit knowledge: of workloads and systemsto reduce difficult problems to tractable ones
The Data-centric Lineup • Lots of exciting work going at Wisconsin in this space! • First session: • John Bent - “Batch-pipelined Workloads” • Doug Thain - “Migratory File Services” • Second Session • Joseph Stanley - “NeST” • Tevfik Kosar - “Stork” • George Kola - “Disk Router” • Guest speaker: Arie Shoshani - “Coscheduling Storage and CPUs”