230 likes | 246 Vues
Chapter 10 Storage and File Structure. 1 st Semester, 2016 Sanghyun Park. Outline. Overview of Physical Storage Media Magnetic Disks and Flash Storage File Organization Organization of Records in Files Data Dictionary Storage. Classification of Physical Storage Media.
E N D
Chapter 10Storage and File Structure 1st Semester, 2016Sanghyun Park
Outline • Overview of Physical Storage Media • Magnetic Disks and Flash Storage • File Organization • Organization of Records in Files • Data Dictionary Storage
Classification of Physical Storage Media • Speed with which data can be accessed • Cost per unit of data • Reliability • Data loss on power failure or system crash • Physical failure of the storage device • Can differentiate storage into: • Volatile storage: loses contents when power is switched off • Non-volatile storage: • Contents persist even when power is switched off • Includes secondary and tertiary storage,as well as battery backed-up main-memory
Storage Hierarchy (2/2) • Primary storage • Fastest media but volatile • Cache, main memory • Secondary storage • Next level in hierarchy, non-volatile, moderately fast access time • Also called on-line storage • Flash memory, magnetic disks • Tertiary storage • Lowest level in hierarchy, non-volatile, slow access time • Also called off-line storage • Optical disk, magnetic tape
Optimization of Disk-Block Access (1/2) • Block • A contiguous sequence of sectors from a single track • Data is transferred between disk and main memory in blocks • Sizes range from 512 bytes to several kilobytes • Disk-arm-scheduling algorithms order pending accesses to tracks so that disk arm movement is minimized (e.g., elevator algorithm) R5 R2 R3 R1 R4 R6 Inner track Outer track
Optimization of Disk-Block Access (2/2) • To reduce block-access time, we can organize blocks on disk in a way that corresponds to how data will be accessed • For example, we can store related information on the same or nearby cylinders • However, blocks may get scattered all over the disk by insertionor deletion (fragmentation) • Some systems have utilities for de-fragmentation
Flash Storage (1/2) • There are two types of flash memory, called NAND and NOR flash • NAND flash has a much higher storage capacity for a given cost, and is widely used for data storage in devices such as cameras, music players, cell phones, and increasingly laptop computers • Has a lower cost per byte than main memory, in addition to being non-volatile • Requires page-at-a-time read (page: 512 bytes to 4KB) • Transfer rate is around 20 MB/sec • Solid state disks: use multiple flash storage devices to provide higher transfer rate of 100 to 200 MB/sec
Flash Storage (2/2) • Once written, a flash page cannot be directly overwritten; instead,it has be erased and rewritten subsequently • Erase operation is performed on a number of pages, called an erase block, at once, and takes about 1 to 2 milliseconds • There is a limit to how many times a flash page can be erased, typically around 100,000 to 1,000,000 times (wear leveling) • Flash memory systems reduce the impact of these problems using a software layer called FTL (Flash Translation Layer)
File Organization • A database is stored as a collection of files, each file is organized logically as a sequence of records • Each file is also logically partitioned into fixed-length storage units called blocks, which are the units of both storage allocation and data transfer • One approach: • Assume record size is fixed • Each file has records of one particular type only • Different files are used for different relations • This approach is easiest to implement; we will consider variable length records later
Fixed-Length Records • Simple approach • Store record i starting from byte n i, where n is the size of each record • Record access is simple but records may cross blocks • Deletion of record i:alternatives: • Move records i + 1, . . ., mto i, . . . , m – 1 • Move record m to i • Do not move records, but link all free records on afree list
Free Lists • Store the address of the first deleted record in the file header • Use this first record to store the address of the second deleted record, and so on • Can think of these stored addresses as pointers • More space efficient representation: reuse space for normal attributes of free records to store pointers
Variable-Length Records • Variable-length records arise in database systems in several ways: • Storage of multiple record types in a file • Record types that allow variable lengths for one or more fields • Record types that allow repeating fields (used in some older data models) • Byte-string representation • Store each record as a string of consecutive bytes • Attach an end-of-record () control character to the end of each record • Not easy to reuse space occupied formerly by a deleted record • There is no space, in general, for records to grow longer • Thus, the basic byte-string representation is not usually used for implementing variable-length records
Variable-Length Records:Slotted Page Structure • Slotted page header contains: • Number of record entries • End of free space in the block • Location and size of each record • Records can be moved around within a page to keep them contiguous with no empty space between them; entry in the header must be updated • Pointers should not point directly to record — instead they should point to the entry for the record in header
Organization of Records in Files • So far, we have studied how records are represented in a file structure; Given a set of records, the next question is how to organize them in a file • Heap – a record can be placed anywhere in the file where there is space • Sequential – store records in sequential order, based on the value of the “search key” of each record • Hashing – a hash function is computed on some attribute of each record; the result specifies in which block of the file the record should be placed • Records of each relation may be stored in a separate file.In a multitable clustering file organization records of several different relations can be stored in the same file • Motivation: store related records on the same block to minimize I/O
Sequential File Organization (1/2) • Suitable for applications that require sequential processing of the entire file • The records in the file are ordered by a search-key • A search key is any attribute or set of attributes; it need not be the primary key, or even a superkey • To permit fast retrieval of records in search-key order, we chain together records by pointers
Sequential File Organization (2/2) • Deletion – use pointer chains • Insertion – locate the position where the record is to be inserted • If there is free space in the block, insert there • If no free space, insert the record in an overflow block • In either case, pointer chain must be updated • Need to reorganize the filefrom time to time to restoresequential order
Multitable Clustering File Organization (1/2) • Simple file structure stores each relation in a separate file • Can instead store several relations in one file using a multitable clustering file organization • E.g., multitable clustering organization of department and instructor: • Good for queries joining department and instructor relations • Bad for queries involving only department • Results in variable size records
Multitable Clustering File Organization (2/2) department instructor multitable clustering of department and instructor
Data Dictionary Storage (1/2) Data dictionary (also called system catalog) stores metadata; that is data about data, such as • Information about relations • Names of relations • Names and types of attributes of each relation • Names and definitions of views • Integrity constraints • User and accounting information, including passwords • Statistical and descriptive data • Number of tuples in each relation • Physical file organization information • How relation is stored (sequential/hash/…) • Physical location of relation • Information about indices (Chapter 11)