five minute rule ten years later and other computer storage rules of thumb n.
Skip this Video
Loading SlideShow in 5 Seconds..
“Five minute rule ten years later and other computer storage rules of thumb” PowerPoint Presentation
Download Presentation
“Five minute rule ten years later and other computer storage rules of thumb”

“Five minute rule ten years later and other computer storage rules of thumb”

331 Vues Download Presentation
Télécharger la présentation

“Five minute rule ten years later and other computer storage rules of thumb”

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. “Five minute rule ten years later and other computer storage rules of thumb” Authors: Jim Gray, Goetz Graefe Reviewed by: Nagapramod Mandagere Biplob Debnath

  2. Outline • Problem Statement • Motivation • Importance and Relevance • Main Contributions and Validation • Key Ideas • Illustrations • New Metrics • Assumptions • Re-write Today • Questions

  3. Problem Statement • Broader Problem: Viewing developments over a long period of time to try and extract important technology trends. • Specific Instance: Inferring rules of thumb for buffer replacement policies in a number of settings, including RAID environments. • Given: Trends over time for parameters such as memory cost, disk cost, tape cost • Find: Rules of thumb for deciding where to store the data and when to replace data from memory buffer • Objectives: Simple rules, extensible rules • Constraints: Hierarchical Storage Model

  4. Typical Database Administrators Dilemma The performance isn’t good. Am I doing something wrong? Should I cache on the client? Should I cache this data in memory? Should store data back on disk? (local or network disk) Should I move data to tape?

  5. Importance & Relevance • Different rates at which parameters changes • seek/second & Disk capacity – 10x to 100x • Disk MB/K$ & DRAM MB/K$ - 1000x

  6. Importance & Relevance • The location of data is very important • Main Memory: Very Fast, Expensive, limited size • Disk Storage: Lot slower that main memory, inexpensive, close to unlimited size • Tape Storage: Slowest, dirt cheap, unlimited capacity • How can one decide what data resides where? • System Learns from data access patterns and adapts (Admins hate to give up control) • Administrator controls data locality by using some experience or historical performance info (rules of thumb)

  7. Main Contributions & Validation • The Five minute rule • Randomly accessed buffer pages can be replaced if unused for more than 5 minutes. • Sequentially accessed buffer pages can be replaced if unused for more than 1 minute. • Metrics for storage performance characterization • Cost/Access • Maps: Megabyte accesses per second • Scan: Time it takes to sequentially read or write all the data in the device • Validation Methodology - Examples • Examples • Random access • On pass sort • Two pass sort • Trends observed over a period of time

  8. Key Ideas • Tradeoff between the cost of RAM and the cost of disk accesses. • The tradeoff is that caching pages in the extra memory can save disk IOs. • The break-even point is met when the rent on the extra memory for cache ($/page/sec) exactly matches the savings in disk accesses per second ($/disk_access/sec).

  9. Illustration – Typical System in 1997 • For a system with following characteristics • PagesPerMBofRAM = 128 pages/MB (8KB pages) • AccessesPerSecondPerDisk = 64 access/sec/disk • PricePerDiskDrive = 2000 $/disk (9GB + controller) • PricePerMBofDRAM = 15 $/MB_DRAM • The Inter reference interval is 266 seconds ~ 5 minutes

  10. Illustration • One pass algorithms • reads data and never references it, • no need to cache the data in RAM. • system needs only enough buffer memory to allow data to stream from disk to main memory. • Typically, two or three one-track buffers (~100 KB) are adequate per disk to buffer disk operations and allow the device to stream data to the application.

  11. Illustration • Two pass algorithms • sequential operations that read a large dataset and then revisit parts of the data. • Database join, cube, rollup, and sort operators • Sorting uses two pass if memory size is smaller than the data set size • Inter reference time is typically about a minute (sequential data access)

  12. Illustration – Two Pass Sort • One pass sort needs larger amount of memory • Memory needed grows faster with size of input file • For files bigger than memory size, two pass is the only option

  13. Disk vs Tape tradeoff • Tape vs Disk Trade off ????? • Tape - larger penalty (slower access, least cost) • Solution – Larger breakeven point, bigger page size

  14. New Metrics • Data flow applications which stream huge amounts of data like data mining applications, multimedia applications • New Metrics • Kaps • Kilo byte accesses per second • Maps • Mega byte accesses per second • Scan • Time taken to sequentially read or write all data on a device • These metrics combined with rent costs provide a price/performance metric

  15. Assumptions • Disk storages have same characteristics (cost/performance). It assumes that the disk storage systems is homogenous and does not consider the more recent shift towards hierarchical/heterogeneous storage systems. • The trade off only consider the performance aspect, the security and fault tolerance issues are assumed to be uniform throughout.

  16. Re-write • Re-evaluate the rules of thumb considering more recent costs and the more recent trends in storage systems like heterogeneous/hierarchical storage • Take into account SAN, NAS characteristics

  17. Questions??? • Does Five minute rule hold good today??? • No (With Reservations) • If one changes the Page Size to MegaByte range, five minute rule still applies. • Pages/MB of RAM = 16 (8 K pages) • Access/sec/disk = 64 • Price/disk drive = $400 • Price/MB of RAM = $0.1 • Break even point ~ 1000s • Further Evidence - Jim (Keynote in FAST 2004) Gray