Lecture #6

Lecture #6 Storing Data on Disks and Files

Agenda • Disk and files • Disk Space Management • Buffer Management • Record formats • Page Formats • What is Indexing? • Different indexing methods • How to create indexes using SQL

The Memory Hierarchy

Disks and Files • DBMS stores information on (“hard”) disks. • This has major implications for DBMS design! • READ: transfer data from disk to main memory (RAM). • WRITE: transfer data from RAM to disk. • Both are high-cost operations, relative to in-memory operations, so must be planned carefully!

Why Not Store Everything in Main Memory? • Costs too much. • Same amount of money will buy you say either 128MB of RAM or 20GB of disk. • Main memory is volatile. • We want data to be saved between runs. (Obviously!) • Typical storage hierarchies: • Main memory (RAM) for currently used data (primary storage) . • Disk for the main database (secondary storage). • Tapes for archiving older versions of data (tertiary storage).

Disks • Secondary storage device of choice. • Main advantage over tapes: • random access vs.sequential. • Data is stored and retrieved in units : • called disk blocks or pages. • Unlike RAM, time to retrieve a disk page varies depending upon location on disk. • Therefore, relative placement of pages on disk has major impact on DBMS performance!

Components of a Disk • The platters spin (say, 90 rps). • The arm assembly is moved in or out to position a head on a desired track. • Tracks under heads make a cylinder(imaginary!). • Only one head reads/writes at any one time.

Accessing a Disk Page • Time to access (read/write) a disk block: • seek time (moving arms to position disk head on track) • rotational delay (waiting for block to rotate under head) • transfer time (actually moving data to/from disk surface) • Seek time and rotational delay dominate. • Seek time varies from about 1 to 20msec • Rotational delay varies from 0 to 10msec • Transfer rate is about 1msec per 4KB page • Lower I/O cost: reduce seek/rotation delays!

Arranging Pages on Disk • `Next’ block concept: • blocks on same track, followed by • blocks on same cylinder, followed by • blocks on adjacent cylinder • Blocks in a file should be arranged sequentially on disk (by `next’), to minimize seek and rotational delay. • For a sequential scan, pre-fetchingseveral pages at a time is a big win!

RAID (Redundant Array of Independent Disks) • Disk Array: Arrangement of several disks that gives abstraction of a single large disk. • Goals: Increase performance and reliability. • Two main techniques: • Data striping: Data is partitioned; Size of a partition is called the striping unit. Partitions are distributed over several disks. • Redundancy: More disks => more reliable. Redundant information allows reconstruction of data if a disk fails.

RAID 0

RAID 1

RAID 1+0

RAID 3

RAID 4

RAID 5

RAID 6

RAID Levels • Level 0: No redundancy • distributes data across multiple disks in a way that gives improved speed at any given instant. If one disk fails, however, all of the data on the array will be lost, • Utilization 100% • Level 1: Mirrored (two identical copies) • Each disk has a mirror image (check disk) • Parallel reads, a write involves two disks. • Maximum transfer rate = transfer rate of one disk • Utilization 50%

Choosing RAID Levels • RAID Level 0: data loss is not an issue • RAID Level 0+1: • small storage subsystems, the cost of mirroring is moderate • High percentage of writes • RAID Level 3 • Large transfer requests of several contiguous blocks • RAID Level 5 • A good general-purpose solution • Good performance for large as well as small requests

Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Structure of a DBMS These layers must consider concurrency control and recovery • A typical DBMS has a layered architecture. • Disk Storage hierarchy, RAID • Disk Space Management Roles, Free blocks • Buffer Management Buffer Pool, Replacement policy • Files and Access Methods File organization (heap files, sorted file, indexes) File and page level storage (collection of pages or records) Index Files System Catalog Data Files

Disk Space Management • Lowest layer of DBMS software manages space on disk. • Higher levels call upon this layer to: • allocate/de-allocate a page • read/write a page • Higher levels don’t need to know how this is done, or how free space is managed.

DISK SPACE MANAGEMENT • The lowest level of software in the DBMS architecture called the disk space manager, manages space on disk. Abstractly, the disk space manager supports the concept of a page as a unit of data, and provides commands to allocate or deallocate a page and read or write a page.

Disk Space Managment • Keeping Track of Free Blocks • A database grows and shrinks as records are inserted and deleted over time. The disk space manager keeps track of which disk blocks are in use, in addition to keeping track of which pages are on which disk blocks.

DB Buffer Management in a DBMS Page Requests from Higher Levels • Data must be in RAM for DBMS to operate on it! • Table of <frame#, pageid> pairs is maintained. BUFFER POOL disk page free frame MAIN MEMORY DISK choice of frame dictated by replacement policy

When a Page is Requested ... • If requested page is not in buffer pool: • Choose a frame for replacement • If frame is dirty, write it to disk • Read requested page into chosen frame • Pin the page and return its address. • If requests can be predicted (e.g., sequential scans) • pages can be pre-fetched(several pages at a time)!

More on Buffer Management • Requestor of page must unpin it, and indicate whether page has been modified: • dirtybit is used for this. • Page in pool may be requested many times, • a pin count is used. • A page is a candidate for replacement iffpin count = 0.

Buffer Replacement Policy • Frame is chosen for replacement by a replacement policy: • Least-recently-used (LRU), Clock, MRU etc. • Policy can have big impact on # of I/O’s; depends on access pattern. • Sequential flooding: Nasty situation caused by LRU + repeated sequential scans. • # buffer frames < # pages in file means each page request causes an I/O. • MRU much better in this situation (but not in all situations, of course).

DBMS vs. OS File System OS does disk space & buffer mgmt already! So why not let OS manage these tasks? • Differences in OS support: Portability issues • Some limitations, e.g., files don’t span multiple disk devices. • Buffer management in DBMS requires ability to: • pin a page in buffer pool, • force a page to disk (important for implementing CC & recovery), • adjust replacement policy, and pre-fetch pages based on access patterns in typical DB operations.

Review previous lecture • DBMS has to store data somewhere • Choices: • Main memory • Expensive – compared to secondary and tertiary storage • Fast – in memory operations are fast • Volatile – not possible to save data from one run to its next • Used for storing current data • Secondary storage (hard disk) • Less expensive – compared to main memory • Slower – compared to main memory, faster compared to tapes • Persistent – data from one run can be saved to the disk to be used in the next run • Used for storing the database • Tertiary storage (tapes) • Cheapest • Slowest – sequential data access • Used for data archives

DBMS stores data on hard disks • This means that data needs to be • read from the hard disk into memory (RAM) • Written from the memory onto the hard disk • Because I/O disk operations are slow query performance depends upon how data is stored on hard disks • The lowest component of the DBMS performs storage management activities • Other DBMS components need not know how these low level activities are performed

Basics of Data storage on hard disk • A disk is organized into a number of blocks or pages • A page is the unit of exchange between the disk and the main memory • A collection of pages is known as a file • DBMS stores data in one or more files on the hard disk

Database Tables on Hard Disk • Database tables are made up of one or more tuples (rows) • Each tuple has one or more attributes • One or more tuples from a table are written into a page on the hard disk • Larger tuples may need more than one page! • Tuples on the disk are known as records • Records are separated by record delimiter • Attributes on the hard disk are known as fields • Fields are separated by field delimiter

Page Formats • Page : abstraction is used for I/O • Record : data granularity for higher level of DBMS • How to arrange records in pages? • Identify a record: • <page_id, slot_number>, where slot_number = rid • Most cases, use <page_id, slot_number> as rid. • Alternative approaches to manage slots on a page • How to support insert/deleting/searching?

Records Formats: Fixed Length Record • Information about field types same for all records in a file • Stored record format in systemcatalogs. + Finding i’th field does not require scan of record, just offset calculation. F1 F2 F3 F4 L1 L2 L3 L4 Base address (B) Address = B+L1+L2

Page Formats: Fixed Length Records Slot 1 Slot 1 Slot 2 Slot 2 • Record id = <page id, slot #>. • Note: In first alternative, moving records for free space management changes rid; may not be acceptable if existing external references to the record that is moved. Free Space . . . . . . Slot N Slot N Slot M N . . . 1 1 1 M 0 M ... 3 2 1 number of records number of slots PACKED UNPACKED, BITMAP

4 $ $ $ $ Record Formats: Variable Length • Two alternative formats (# fields is fixed): F1 F2 F3 F4 Fields Delimited by Special Symbols Field Count F1 F2 F3 F4 Array of Field Offsets + Second offers direct access to i’th field + efficient storage of nulls ; - small directory overhead.

Page Formats: Variable Length Records • Slot directory = {<record_offset, record_length>} Offset of record from start of data area Rid = (i,N) Length = 20 Page i Rid = (i,2) Length = 16 Rid = (i,1) Length = 24 N Pointer to start of free space 20 16 24 N . . . 2 1 # slots SLOT DIRECTORY

Page Formats: Variable Length Records • Slot directory = {<record_offset, record_length>} • Dis/Advantages: + Moving: rid is not changed + Deletion: offset = -1 (rid changed? Can we delete slot? Why?) + Insertion: Reuse deleted slot. Only insert if none available. • Free space? Free space pointer? Recycle after deletion?

System Catalogs • Meta information stored in system catalogs. • For each index: • structure (e.g., B+ tree) and search key fields • For each relation: • name, file name, file structure (e.g., Heap file) • attribute name and type, for each attribute • index name, for each index • integrity constraints • For each view: • view name and definition • Plus statistics, authorization, buffer pool size, etc. • Catalogs are themselves stored as relations!

Attr_Cat(attr_name, rel_name, type, position)

File Organization & Indexing

File Organization • The physical arrangement of data in a file into records and pages on the disk • File organization determines the set of access methods for • Storing and retrieving records from a file • Therefore, ‘file organization’ synonymous with ‘access method’ • We study three types of file organization • Unordered or Heap files • Ordered or sequential files • Hash files • We examine each of them in terms of the operations we perform on the database • Insert a new record • Search for a record (or update a record) • Delete a record

Unordered Or Heap File • Records are stored in the same order in which they are created • Insert operation • Fast – because the incoming record is written at the end of the last page of the file • Search (or update) operation • Slow – because linear search is performed on pages • Delete Operation • Slow – because the record to be deleted is first searched for • Deleting the record creates a hole in the page • Periodic file compacting work required to reclaim the wasted space

Ordered or Sequential File • Records are sorted on the values of one or more fields • Ordering field – the field on which the records are sorted • Ordering key – the key of the file when it is used for record sorting • Search (or update) Operation • Fast – because binary search is performed on sorted records • Update the ordering field? • Delete Operation • Fast – because searching the record is fast • Periodic file compacting work is, of course, required • Insert Operation • Poor – because if we insert the new record in the correct position we need to shift all the subsequent records in the file • Alternatively an ‘overflow file’ is created which contains all the new records as a heap • Periodically overflow file is merged with the main file • If overflow file is created search and delete operations for records in the overflow file have to be linear!

Hash File • Is an array of buckets • Given a record, r a hash function, h(r) computes the index of the bucket in which record r belongs • h uses one or more fields in the record called hash fields • Hash key - the key of the file when it is used by the hash function • Example hash function • Assume that the staff last name is used as the hash field • Assume also that the hash file size is 26 buckets - each bucket corresponding to each of the letters from the alphabet • Then a hash function can be defined which computes the bucket address (index) based on the first letter in the last name.

Lecture #6

Lecture #6

Presentation Transcript

LECTURE

Lecture 25 Lecture 26

Lecture

Lecture

Lecture VIII Lecture IX

Lecture

Lecture 10 Lecture 10 Lecture 11 Lecture 11 Lecture 11 Lecture 11

Lecture S1: Sample Lecture

Lecture