1 / 28

THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM. J. Wilkes, R. Golding, C. Staelin T. Sullivan HP Laboratories, Palo Alto, CA. INTRODUCTION. must protect data against disk failures: too frequent and too hard to repair possible solutions: for small numbers of disks: mirroring

saxton
Télécharger la présentation

THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. THE HP AUTORAIDHIERARCHICAL STORAGE SYSTEM J. Wilkes, R. Golding, C. StaelinT. Sullivan HP Laboratories, Palo Alto, CA

  2. INTRODUCTION • must protect data against disk failures: too frequent and too hard to repair • possible solutions: • for small numbers of disks: mirroring • for larger number of disks: RAID

  3. RAID • Typical RAID Organizations • Level 3: bit or byte level interleaved with dedicated parity disk • Level 5: block interleaved with parity blocks stored on all disks

  4. LIMITATIONS OF RAID (I) • Each RAID level performs well for a narrow range of workloads • Too many parameters to configure: data- and parity-layout, stripe depth, stripe width, cache sizes, write-back policies, ...

  5. LIMITATIONS OF RAID (II) • Changing from one layout to another or adding capacity requires downloading and reloading the data • Spare disks remain unused until a failure occurs

  6. A BETTER SOLUTION • A managed storage hierarchy: • mirror active data • store in a RAID 5 less active data • This requires locality of reference: • active subset must be rather stable:found to be true in several studies

  7. IMPLEMENTATION LEVEL • Storage hierarchy could be implemented • Manually: can use the most knowledge but cannot adapt quickly • In the file system: offers best balance of knowledge and implementation freedom but specific to a particular file system • Through a smart array controller: easiest to deploy (HP AutoRAID)

  8. MAJOR FEATURES (I) • Mapping of host block addresses to physical disk locations • Mirroring of write-active data • Adaptation to changes in the amount of data stored: • Starts RAID 5 when array becomes full • Adaptation to workload changes: • Hot-pluggable disks, fans, power supplies and controllers

  9. MAJOR FEATURES (II) • On-line storage capacity expansion: system switches then to mirroring • Can mix or match disk capacities • Controlled fail-over: can havedual controllers (primary/standby) • Active hot spares: used for more mirroring • Simple administration and setup: appears to host as one or more logical units • Log-structured RAID 5 writes

  10. RELATED WORK (I) • Storage Technology Corporation Iceberg: • also uses redirection but based on RAID 6 • handles variable size records • emphasis on very high reliability

  11. RELATED WORK (II) • Floating parity scheme from IBM Almaden: • Relocated parity blocks and uses distributed sparing • Work on log-structured file systems at U.C. Berkeley and cleaning policies

  12. RELATED WORK (III) • Whole literature on hierarchical storage systems • Schemes compressing inactive data • Use of non-volatile memory (NVRAM) for optimizing writes • Allows reliable delayed writes

  13. Control Control Control Control OVERVIEW Processor,RAM and Control Logic Parity Logic 2x10MB/s bus DRAM Read Cache Matching RAM NVRAM Write Cache Other RAM SCSIController 20 MB/s Host Computer

  14. PHYSICAL DATA LAYOUT • Data space on disks is broken up into large Physical EXTents (PEXes): • Typical size is 1 MB • PEXes can be combined to form Physical Extent Groups (PEGs) containing at least three PEXes on three different disks • PEGs can be assigned to the mirrored storage class or to the RAID 5 storage class • Segments are the units on contiguous space on a disk (128 KB in prototype)

  15. LOGICAL DATA LAYOUT • Logical allocation and migration unit is the Relocation Block (RB) • Size in prototype was 64 KB: • Smaller RB’s require more mapping information but larger RB’s increase migration costs after small updates • Each PEG holds a fixed number of RB’s

  16. MAPPING STRUCTURES • Map addresses from virtual volumes to PEGs, PEXes and physical disk addresses • Optimized for finding fast the physical address of a RB given its logical address : • Each logical unit has a virtual device table listing all RB’s in the logical unitand pointing to their PEG • Each PEG has a PEG Table listing all RB’s in the PEG and the PEXes used to store them

  17. NORMAL OPERATIONS (I) • Requests are sent to the controller in SCSI Command Descriptor Blocks (CDB): • Up to 32 CB’s can be simultaneously active and 2048 other ones queued • Long requests are broken into 64 KB segments

  18. NORMAL OPERATIONS (II) • Read requests: • Test first to see if data are not already in read cache or in non-volatile write cache • Otherwise allocate space in cache and issue one or more requests to back-end storage classes • Write requests return as soon as data are modified in non-volatile write cache: • Cache has a delayed write policy

  19. NORMALOPERATIONS(III) • Flushing data from cache can involve; • A back-end write to a mirrored storage class • Promotion from RAID 5 to mirrored storage before the write • Mirrored reads and writes are straightforward

  20. NORMAL OPERATIONS (IV) • RAID 5 reads are straightforward • RAID 5 writes can be done: • On a per-RB base: requires two reads and two writes • In batched writes: more complex but cheaper

  21. BACKGROUND OPERATIONS • Triggered when array has been idle for some time • Include • Compaction of empty RB slots, • Migration between storage classes (using an approximate LRU algorithm) and • Load balancing between disks

  22. MONITORING • System also includes: • An I/O logging tool and • A management tool for analyzing the array performance

  23. PERFORMANCE RESULTS (I) • HP AutoRAID configuration with: • 16 MB of controller data cache • Twelve 2.0GB Seagate Barracuda disks (7200rpm) • Compared with: • Data General RAID array with64 MB front-end cache • Eleven individual disk drives implementing disk striping but without any redundancy

  24. PERFORMANCE RESULTS (II) • Results of OLTP database workload: • AutoRAID was better than RAID array and comparable to set of non-redundant drives • But whole database was stored in mirrored storage! • Micro benchmarks: • AutoRAID is always better than RAID array but has smaller I/O rates than set of drives

  25. SIMULATION RESULTS (I) • Increasing the disk speed improves the throughput: • Especially if density remains constant • Transfer rates matter more than rotational latency • 64KB seems to be a good size for the Relocation Blocks: • Around the size of a disk track

  26. SIMULATION RESULTS (II) • Best heuristics for selecting the mirrored copy to be read is shortest queue • Allowing write cache overwrites has a HUGE impact on performance • RB’s demoted to RAID should use existing holes when the system is not too loaded

  27. SUMMARY (I) • System is very easy to set up: • Dynamic adaptation is a big win but it will not work for all workloads • Software is what makes AutoRAID, not the hardware • Being auto adaptive makes AutoRAIDhard to benchmark

  28. SUMMARY (II) • Future work includes: • System tuning especially • Idle period detection • Front-end cache management algorithms • Developing better techniques for synthesizing traces

More Related