350 likes | 878 Vues
Storage and File Structure. DBMS and Storage/File Structure. Why do we need to know about storage/file structure Many database technologies are developed to utilize the storage architecture/hierarchy Data in the database needs to be organized and stored/retrieved efficiently.
E N D
Storage and File Structure Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
DBMS and Storage/File Structure • Why do we need to know about storage/file structure • Many database technologies are developed to utilize the storage architecture/hierarchy • Data in the database needs to be organized and stored/retrieved efficiently Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Storage Hierarchy Volatile primary storage Cache unit price Memory Flash Memory Secondary storage Magnetic Disk Non-volatile speed Tertiary storage Optical Disk Magnetic Tape Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Primary Storage (Volatile) • Cache • Speed: 7 to 20 ns (1 nanosecond = 10–9 seconds) • Capacity: • A typical PC level 2 cache 64KB-2 MB. • Within processors, level 1 cache usually ranges in size from 8 KB to 64 KB. • Main memory: • Speed: 10s to 100s of nanoseconds; • Capacity: Up to a few Gigabytes widely used currently • per-byte costs have decreased roughly factor of 2 every 2 to 3 years) Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Secondary Storage (Non-volatile) • Flash memory • Speed: • read speed similar to main memory, write is much slower • Capacity: 32M to 512M currently • Forms: SmartMedia, memory stick, secure digital, BIOS • Cost: roughly same as main memory • Magnetic-disk • Capacities: up to roughly 100 GB currently • Growing constantly and rapidly with technology improvements (factor of 2 to 3 every 2 years) Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Tertiary Storage (Non-volatile) • Optical storage • CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms • CD-RW, DVD-RW, and DVD-RAM • Reads and writes are slower than with magnetic disk • Juke-box systems, with large numbers of removable disks, a few drives, and a mechanism for automatic loading/unloading of disks available for storing large volumes of data Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Tertiary Storage (Non-volatile) • Tape storage • non-volatile, used primarily for backup (to recover from disk failure), and for archival data • sequential-access– much slower than disk • very high capacity (40 to 300 GB tapes available) • Tape jukeboxes available for storing massive amounts of data • hundreds of terabytes (1 terabyte = 109 bytes) to even a petabyte (1 petabyte = 1012 bytes) Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Between Memory and Disk • The permanent residency of database is mostly on disk • In database, cost is usually measured by the number of disk I/O • But disks are too slow and we need memory to be the buffers … but memory is volatile • this introduced a number of issues Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Lets talk about disk Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Disk Subsystem Disk interface standards families • ATA (AT adaptor) range of standards • SCSI (Small Computer System Interconnect) range of standards • Several variants of each standard (different speeds and capabilities) Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Disk Speed Discuss ways to improve disk reading speed Rotation time/latency milliseconds Data-transfer rate (4-8MB/sec) • Typical numbers: • 16,000 tracks per platter • sectors per track: 200 – 400 • 512 bytes per sector • 4-16KB per block • 5,400 - 15,000 r p m Seek time (milliseconds) Access time = seek time + latency Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
RAID • RAID: Redundant Arrays of Independent Disks • high capacity and high speed by using multiple disks in parallel, and • high reliability by storing data redundantly Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Mean time to failure (MTTF) • Average time the disk is expected to run continuously without any failure. • Typically 3 to 5 years (1 year = 8,760 hours) • MTTF = 30,000 to 1,200,000 hours for a new disk • an MTTF of 1,200,000 hours for a new disk means that given 1000 relatively new disks, on an average one will fail every 1200 hours • When number of disks increase, the chance of some disk failure increase proportionally Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Parallelism • Two main goals of parallelism in a disk system: 1. Load balance multiple small accesses to increase throughput 2. Parallelize large accesses to reduce response time. • Basic strategy: Stripping • Compare and contrast bit stripping and byte stripping Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Redundancy • store extra information that can be used to rebuild information lost in a disk failure • Basic strategy: mirroring, parity • Mean time to data loss depends on mean time to failure, and mean time to repair • E.g. MTTF of 100,000 hours, mean time to repair of 10 hours gives mean time to data loss of 500*106 hours (or 57,000 years) for a mirrored pair of disks (ignoring dependent failure modes) Data Parity 10010010 1 Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
RAID levels Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Choice of RAID Levels • Level 1 provides much better write performance than level 5 • Level 5 requires at least 2 block reads and 2 block writes to write a single block, whereas Level 1 only requires 2 block writes • Level 1 preferred for high update environments such as log disks • Level 1 had higher storage cost than level 5 • disk drive capacities increasing rapidly (50%/year) whereas disk access times have decreased much less (x 3 in 10 years) • I/O requirements have increased greatly, e.g. for Web servers • When enough disks have been bought to satisfy required rate of I/O, they often have spare storage capacity • so there is often no extra monetary cost for Level 1! • Level 5 is preferred for applications with low update rate,and large amounts of data • Level 1 is preferred for all other applications Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Buffer Management • Database can not fit entirely in memory, needs memory as a buffer for speed reasons • LRU is used in many OS • Spatial and temporal locality due to loops • Database has a more predictable behavior • Example: join Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
DBMS Buffer Management Strategies • Pinned block – memory block that is not allowed to be written back to disk. • Toss-immediate strategy – frees the space occupied by a block as soon as the final tuple of that block has been processed • Most recently used (MRU) strategy – system must pin the block currently being processed. After the final tuple of that block has been processed, the block is unpinned, and it becomes the most recently used block. • Statistical information based • E.g., the data dictionary is frequently accessed. Heuristic: keep data-dictionary blocks in main memory buffer • Buffer managers also support forced output of blocks for the purpose of recovery Yan Huang - CSCI5330 Database Implementation –Storage and File Structure
Exercises Yan Huang - CSCI5330 Database Implementation –Storage and File Structure