1 / 22

Chapter 10 Storage and File Structure

Chapter 10 Storage and File Structure. 1 st Semester, 2016 Sanghyun Park. Outline. Overview of Physical Storage Media Magnetic Disks and Flash Storage File Organization Organization of Records in Files Data Dictionary Storage. Classification of Physical Storage Media.

ejulio
Télécharger la présentation

Chapter 10 Storage and File Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 10Storage and File Structure 1st Semester, 2016Sanghyun Park

  2. Outline • Overview of Physical Storage Media • Magnetic Disks and Flash Storage • File Organization • Organization of Records in Files • Data Dictionary Storage

  3. Classification of Physical Storage Media • Speed with which data can be accessed • Cost per unit of data • Reliability • Data loss on power failure or system crash • Physical failure of the storage device • Can differentiate storage into: • Volatile storage: loses contents when power is switched off • Non-volatile storage: • Contents persist even when power is switched off • Includes secondary and tertiary storage,as well as battery backed-up main-memory

  4. Storage Hierarchy (1/2)

  5. Storage Hierarchy (2/2) • Primary storage • Fastest media but volatile • Cache, main memory • Secondary storage • Next level in hierarchy, non-volatile, moderately fast access time • Also called on-line storage • Flash memory, magnetic disks • Tertiary storage • Lowest level in hierarchy, non-volatile, slow access time • Also called off-line storage • Optical disk, magnetic tape

  6. Magnetic Hard Disk Mechanism

  7. Optimization of Disk-Block Access (1/2) • Block • A contiguous sequence of sectors from a single track • Data is transferred between disk and main memory in blocks • Sizes range from 512 bytes to several kilobytes • Disk-arm-scheduling algorithms order pending accesses to tracks so that disk arm movement is minimized (e.g., elevator algorithm) R5 R2 R3 R1 R4 R6 Inner track Outer track

  8. Optimization of Disk-Block Access (2/2) • To reduce block-access time, we can organize blocks on disk in a way that corresponds to how data will be accessed • For example, we can store related information on the same or nearby cylinders • However, blocks may get scattered all over the disk by insertionor deletion (fragmentation) • Some systems have utilities for de-fragmentation

  9. Flash Storage (1/2) • There are two types of flash memory, called NAND and NOR flash • NAND flash has a much higher storage capacity for a given cost, and is widely used for data storage in devices such as cameras, music players, cell phones, and increasingly laptop computers • Has a lower cost per byte than main memory, in addition to being non-volatile • Requires page-at-a-time read (page: 512 bytes to 4KB) • Transfer rate is around 20 MB/sec • Solid state disks: use multiple flash storage devices to provide higher transfer rate of 100 to 200 MB/sec

  10. Flash Storage (2/2) • Once written, a flash page cannot be directly overwritten; instead,it has be erased and rewritten subsequently • Erase operation is performed on a number of pages, called an erase block, at once, and takes about 1 to 2 milliseconds • There is a limit to how many times a flash page can be erased, typically around 100,000 to 1,000,000 times (wear leveling) • Flash memory systems reduce the impact of these problems using a software layer called FTL (Flash Translation Layer)

  11. File Organization • A database is stored as a collection of files, each file is organized logically as a sequence of records • Each file is also logically partitioned into fixed-length storage units called blocks, which are the units of both storage allocation and data transfer • One approach: • Assume record size is fixed • Each file has records of one particular type only • Different files are used for different relations • This approach is easiest to implement; we will consider variable length records later

  12. Fixed-Length Records • Simple approach • Store record i starting from byte n  i, where n is the size of each record • Record access is simple but records may cross blocks • Deletion of record i:alternatives: • Move records i + 1, . . ., mto i, . . . , m – 1 • Move record m to i • Do not move records, but link all free records on afree list

  13. Free Lists • Store the address of the first deleted record in the file header • Use this first record to store the address of the second deleted record, and so on • Can think of these stored addresses as pointers • More space efficient representation: reuse space for normal attributes of free records to store pointers

  14. Variable-Length Records • Variable-length records arise in database systems in several ways: • Storage of multiple record types in a file • Record types that allow variable lengths for one or more fields • Record types that allow repeating fields (used in some older data models) • Byte-string representation • Store each record as a string of consecutive bytes • Attach an end-of-record () control character to the end of each record • Not easy to reuse space occupied formerly by a deleted record • There is no space, in general, for records to grow longer • Thus, the basic byte-string representation is not usually used for implementing variable-length records

  15. Variable-Length Records:Slotted Page Structure • Slotted page header contains: • Number of record entries • End of free space in the block • Location and size of each record • Records can be moved around within a page to keep them contiguous with no empty space between them; entry in the header must be updated • Pointers should not point directly to record — instead they should point to the entry for the record in header

  16. Organization of Records in Files • So far, we have studied how records are represented in a file structure; Given a set of records, the next question is how to organize them in a file • Heap – a record can be placed anywhere in the file where there is space • Sequential – store records in sequential order, based on the value of the “search key” of each record • Hashing – a hash function is computed on some attribute of each record; the result specifies in which block of the file the record should be placed • Records of each relation may be stored in a separate file.In a multitable clustering file organization records of several different relations can be stored in the same file • Motivation: store related records on the same block to minimize I/O

  17. Sequential File Organization (1/2) • Suitable for applications that require sequential processing of the entire file • The records in the file are ordered by a search-key • A search key is any attribute or set of attributes; it need not be the primary key, or even a superkey • To permit fast retrieval of records in search-key order, we chain together records by pointers

  18. Sequential File Organization (2/2) • Deletion – use pointer chains • Insertion – locate the position where the record is to be inserted • If there is free space in the block, insert there • If no free space, insert the record in an overflow block • In either case, pointer chain must be updated • Need to reorganize the filefrom time to time to restoresequential order

  19. Multitable Clustering File Organization (1/2) • Simple file structure stores each relation in a separate file • Can instead store several relations in one file using a multitable clustering file organization • E.g., multitable clustering organization of department and instructor: • Good for queries joining department and instructor relations • Bad for queries involving only department • Results in variable size records

  20. Multitable Clustering File Organization (2/2) department instructor multitable clustering of department and instructor

  21. Data Dictionary Storage (1/2) Data dictionary (also called system catalog) stores metadata; that is data about data, such as • Information about relations • Names of relations • Names and types of attributes of each relation • Names and definitions of views • Integrity constraints • User and accounting information, including passwords • Statistical and descriptive data • Number of tuples in each relation • Physical file organization information • How relation is stored (sequential/hash/…) • Physical location of relation • Information about indices (Chapter 11)

  22. Data Dictionary Storage (2/2)

More Related