1 / 55

Flexible Storage Allocation

Flexible Storage Allocation. A. L. Narasimha Reddy Department of Electrical and Computer Engineering Texas A & M University Students: Sukwoo Kang (now at IBM Almaden) John Garrison. Outline. Big Picture Part I: Flexible Storage Allocation Introduction and Motivation

Télécharger la présentation

Flexible Storage Allocation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flexible Storage Allocation A. L. Narasimha Reddy Department of Electrical and Computer Engineering Texas A & M University Students: Sukwoo Kang (now at IBM Almaden) John Garrison

  2. Outline • Big Picture • Part I: Flexible Storage Allocation • Introduction and Motivation • Design of Virtual Allocation • Evaluation • Part II: Data Distribution in Networked Storage Systems • Introduction and Motivation • Design of User-Optimal Data Migration • Evaluation • Part III: Storage Management across diverse devices • Conclusion Texas A&M University

  3. Storage Allocation • Allocate entire storage space at the time of the file system creation • Storage space owned by one operating system cannot be used by another Windows NT (NTFS) Linux (ext2) AIX (JFS) 30 GB 50 GB 98 GB Actual Allocations Running out of space! 70 GB 50 GB Texas A&M University

  4. Big Picture • Memory systems employ virtual memory for several reasons • Current storage systems lack such flexibility • Current file systems allocate storage statically at the time of their creation • Storage allocation: Space on the disk is not allocated well across multiple file systems Texas A&M University

  5. 100 GB Common Storage Pool File Systems with Virtual Allocation • When a file system is created with X GB, • Allows the file system to be created with only Y GB, where Y << X • Remaining space used as one common available pool • As the file system grows, the storage space can be allocated on demand Windows NT (NTFS) Linux (ext2) AIX (JFS) 30 GB 50 GB 98 GB Actual Allocations 10 GB 10 GB 60 GB 40 GB Texas A&M University

  6. Physical block address Physical Disk Our Approach to Design • Employ Allocate-on-write policy • Storage space is allocated when the data is written • Writes all data to disk sequentially based on the time at which data is written to the device • Once data is written, data can be accessed from the same location, i.e., data is updated in-place Texas A&M University

  7. Extent Write at t = t’ Physical Disk Allocate-on-write Policy • Storage space is allocated by the unit of the extent when the data is written • Extent is a group of file system blocks • Fixed size • Retain more spatial locality • Reduce information that must be maintained Texas A&M University

  8. Write at t = t’’ (where t’’ > t’) Extent 0 Extent 1 Write at t = t’ Physical Disk Allocate-on-write Policy • Data is written to disk sequentially based on write-time • Further writes to the same data updated in-place • VA (Virtual Allocation) requires additional data structure Texas A&M University

  9. Write at t = t’’ (where t’’ > t’) Extent 0 Extent 1 Extent 2 Write at t = t’ Physical Disk Block map Block Map • Block map keeps a mapping of logical storage locations and real (physical) storage locations Texas A&M University

  10. VA Metadata VA Meta data Extent 0 Extent 1 Extent 2 Physical Disk Hardening Block map • This block map is maintained in memory and regularly written to disk for hardening against system failures • VA Metadata represents the on-disk block map Texas A&M University

  11. On-disk Layout & Storage Expansion Storage Expansion Threshold VA Meta data FS Meta data Extent 0 Extent 1 Extent 2 Extent 3 Extent 4 Storage Expansion Physical Disk Extent 5 Extent 6 Extent 7 Virtual Disk • When the capacity is exhausted or reaches storage expansion threshold, a physical disk can be expanded to other available storage resources • File system unaware of the actual space allocation and expansion Texas A&M University

  12. Write Request Application File System Acknowledgement Page Buffer/Page Cache Layer Search VA block map Block I/O Layer (VA) Hardening Allocate new extent and update mapping information VA Meta data FS Meta data Extent 0 Extent 1 Extent 2 Extent 3 Disk Write Operation Texas A&M University

  13. File System Buffer/Page Cache Layer Search VA block map Block I/O Layer (VA) VA Meta data FS Meta data Extent 0 Extent 1 Extent 2 Extent 3 Disk Read Operation Read Request Application Texas A&M University

  14. Allocate-on-write vs. Other Work • Key difference from log-structured file systems (LFS) • Only allocation is done at the end of log • Updates are done in-place after allocation • LVM still ties up storage at the time of file system creation Texas A&M University

  15. Extent-based Policy Example (with Ext2) I (inode), B (data block), V (VA block map) A  B (B is allocated to A) File system-based Policy Example (with Ext3 ordered mode) Design Issues • VA Metadata Hardening (File System Integrity) • Must keep certain update ordering of VA metadata and FS (meta)data Texas A&M University

  16. Design Issues (cont.) • Extent Size • Larger extent size: Reduce block map size, retain more spatial locality, cause data fragmentation • Reclaiming allocated storage space of deleted files • Needed to continue to provide the benefits of virtual allocation • Without reclamation, possible to turn virtual allocation into static allocation • Interaction with RAID • RAID remaps blocks to physical devices to provide device characteristics • VA remaps blocks for flexibility • Need to resolve performance impact of VA’s extent size and RAID’s chunk size Texas A&M University

  17. Spatial Locality Observations & Issues • Metadata and data separation • Data clustering: Reduce seek distance • Multiple file systems • Data placement policy • Allocate hot data in a high data region of disk • Allocate hot data in the middle of the partition Texas A&M University

  18. Implementation & Experimental Setup • Virtual allocation prototype • Kernel module for Linux 2.4.22 • Employ a hash table in memory for speeding up VA lookups • Setup • A 3GHz Pentium 4 processor, 1GB main memory • Red Hat Linux 9 with a 2.4.22 kernel • Ext2 file system and Ext3 file system • Workloads • Bonnie++ (Large-file workload) • Postmark (Small-file workload) • TPC-C (Database workload) Texas A&M University

  19. VA Metadata Hardening • Compare EXT2 and VA-EXT2-EX • Compare EXT3 and VA-EXT3-EX, VA-EXT3-FS Texas A&M University -7.3 -3.3 -1.2 +4.9 +8.4 +9.5

  20. Reclaiming Allocated Storage Space • Reclaim operation for deleted large files • How to keep track of deleted files? • Employed stackable file system: Maintain duplicated block bitmap • Alternatively, could employ “Life or Death at Block-Level” (OSDI’04) work Texas A&M University

  21. VA-RAID-5 NO-HARDEN VA-RAID-5 NVRAM-17% VA-RAID-5 NVRAM-4% VA-RAID-5 NVRAM-1% VA with RAID-5 • Large-file workload • Small-file workload • Large-file workload with NVRAM • Used Ext2 with software RAID-5 + VA • NVRAM-X%: X% of total VA metadata size Texas A&M University

  22. Data Placement Policy (Postmark) • VA NORMAL partition: Same data rate across a partition • VA ZCAV partition: Hot data is placed in high data region of a partition • VA-NORMAL: start allocation from the outer cylinders • VA-MIDDLE: start allocation from the middle of a partition Texas A&M University 16 24

  23. Multiple File Systems • Used Postmark • VA-HALF: The 2nd file system is created after 40% of the 1st file system is written • VA-FULL: 80% • VA-7GB: 2 x 3.5GB partition, 30% utilization • VA-32GB: 2 x 16GB partition, 80% utilization Texas A&M University

  24. Real-World Deployment of Virtual Allocation Prototype built Texas A&M University

  25. VA in Networked Storage Environment • Flexible allocation provided by VA leads to • Balancing locality vs. load balance issues Texas A&M University

  26. Part II: Data Distribution • Locality-based approach • Use data migration (e.g. HP AutoRAID) • Employ “hot” data migration from slower device (remote disk) to faster device (local disk) • Load balancing-based approach (Striping) • Exploit multiple devices to support the required data rates (e.g. Slice-OSDI’00) Cold data Hot data Texas A&M University

  27. User-Optimal Data Migration • Locality is exploited first • Data is migrated from Disk B to Disk A data • Load balancing is also considered • If the load on Disk A is too high, data is migrated from Disk A to Disk B Texas A&M University

  28. write write read write data Migration Decision Issues • Where to migrate: Use I/O request response time • When to migrate: Migration threshold • Initiate migration from Disk A to Disk B only when • How to migrate: Limit number of concurrent migrations (Migration token) • What data to migrate: Active data Texas A&M University

  29. Design Issues • Allocation policy • Striping with user-optimal migration: will improve data access locality • Sequential allocation with user-optimal migration: will improve load balancing • Multi-user environment • Each user migrates data in a user-selfish manner • Migrations will tend to improve the performance of all users over longer periods of time Texas A&M University

  30. SPECsfs Performance Curve Multi-User Single-User Evaluation • Implemented as a kernel block device driver • Evaluated it using SPECsfs benchmark • Configuration Texas A&M University

  31. Single-User Environment • Configuration: (Allocation Policy)-(Migration Policy) • STR (Striping), SEQ (Seq. Alloc.), NOMIG (No migration), MIG (User-Optimal migration) • Striping with user-optimal migration • Seq. allocation with user-optimal migration Texas A&M University

  32. Single-User Environment (cont.) • Comparison between migration systems • Migration based on locality: hot data (remotelocal), cold data (localremote) Texas A&M University

  33. Multi-User Environment - Striping • Server A: Load from 100 to 700 • Server B: Load from 50 to 350 Texas A&M University

  34. Multi-User Environment – Seq. Allocation • Server A: Load from 100 to 1100 • Server B: Load from 30 to 480 Texas A&M University

  35. Storage Management Across Diverse Devices Flash storage becoming widely available More expensive than hard drives Faster random accesses Low Power consumption In Laptops now In hybrid storage systems soon Manage data across Different Devices Match application needs to device characteristics Optimize for performance, power consumption Texas A&M University Texas A&M University 35 Narasimha Reddy 8/7/2007

  36. Motivation VFS Allows many file systems underneath VFS maintains 1 to 1 mapping from namespace to storage Can we provide different storage options for different files for a single user? /user1/file1 storage system 1, /user2/file2  storage system 2… Texas A&M University Texas A&M University 36 Narasimha Reddy 8/7/2007

  37. Normal File System Architecture Texas A&M University Texas A&M University 37 Narasimha Reddy 8/7/2007

  38. Umbrella File System Texas A&M University

  39. Example Data Organization Texas A&M University

  40. Motivation --Policy Based Storage User or System administrator Choice Allow different types of files on different devices Reliability, performance, power consumption Layered Architecture Leverage benefits of underlying file systems Map applications to file systems and underlying storage Policy decisions can depend on namespace and metadata Example: Files not touched in a week  slow storage system Texas A&M University Texas A&M University 40 Narasimha Reddy 8/7/2007

  41. Rules Structure Provided at mount time User specified Based on inode values (metadata) and filenames (namespace) Provides array of branches Texas A&M University Texas A&M University 41 Narasimha Reddy 8/7/2007

  42. Umbrella File System Sits under VFS to enforce policy Policy enforced at open and close times Policy also enforced periodically (less often) UmbrellaFS acts as a “router” for files Not only based on namespace, but also metadata Texas A&M University Texas A&M University 42 Narasimha Reddy 8/7/2007

  43. Inode Rules Structure Texas A&M University Texas A&M University 43 Narasimha Reddy 8/7/2007

  44. Inode Rules Provide in order of precedence First match Compare inode value to rule At file creation some inode values indeterminate Pass over those rules Texas A&M University Texas A&M University 44 Narasimha Reddy 8/7/2007

  45. Filename Rules Structure Texas A&M University Texas A&M University 45 Narasimha Reddy 8/7/2007

  46. Filename Rules Once first filename rule triggered, all checked Similar to longest prefix matching Double index based on Path matching Filename matching Example: Rules: /home/*/*.bar, /home/jgarrison/foo.bar File: /home/jgarrison/foo.bar File matches second rule more closely (3 path length and 7 characters of file name vs. 3 path length and 4 characters of file name) Texas A&M University Texas A&M University 46 Narasimha Reddy 8/7/2007

  47. Evaluation Overhead Throughput CPU Limited I/O Limited Example Improvement Texas A&M University Texas A&M University 47 Narasimha Reddy 8/7/2007

  48. UmbrellaFS Overhead Texas A&M University

  49. CPU Limited Benchmarks Texas A&M University Texas A&M University 49 Narasimha Reddy 8/7/2007

  50. I/O Limited Benchmarks Texas A&M University Texas A&M University 50 Narasimha Reddy 8/7/2007

More Related