1 / 10

File Organizations

File Organizations. Chapter 8. “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander Pope (1688-1744). Files of Records. Page or block is OK when doing I/O, but higher levels of DBMS operate on records , and files of records .

ophira
Télécharger la présentation

File Organizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Organizations Chapter 8 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander Pope (1688-1744)

  2. Files of Records • Page or block is OK when doing I/O, but higher levels of DBMS operate on records, and files of records. • FILE: A collection of pages, each containing a collection of records. Must support: • insert/delete/modify record • read a particular record (specified using record id) • scan all records (possibly with some conditions on the records to be retrieved)

  3. Unordered (Heap) Files • Simplest file structure contains records in no particular order. • As file grows and shrinks, disk pages are allocated and de-allocated. • To support record level operations, we must: • keep track of the pages in a file • keep track of free space on pages • keep track of the records on a page • There are many alternatives for keeping track of this.

  4. Heap File Implemented as a List • The header page id and Heap file name must be stored someplace. • Each page contains 2 `pointers’ plus data. Data Page Data Page Data Page Full Pages Header Page Data Page Data Page Data Page Pages with Free Space

  5. Data Page 1 Header Page Data Page 2 Data Page N DIRECTORY Heap File Using a Page Directory • The entry for a page can include the number of free bytes on the page. • The directory is a collection of pages; linked list implementation is just one alternative. • Much smaller than linked list of all HF pages!

  6. Alternative File Organizations Many alternatives exist, each ideal for some situation , and not so good in others: • Heap files:Suitable when typical access is a file scan retrieving all records. • Sorted Files:Best if records must be retrieved in some order, or only a `range’ of records is needed. • Hashed Files:Good for equality selections. • File is a collection of buckets. Bucket = primary page plus zero or moreoverflow pages. • Hashing functionh: h(r) = bucket in which record r belongs. h looks at only some of the fields of r, called the search fields.

  7. Example: Stored Database index structure Index on B# Block#= B# mod 10 Index on Author Relation Book 1 11 51 20 30 0 W, … (1, C,W) (20, Y,W) (51, C, B) 1 (11, Y,W) (30, Z, B) Relation Checkout (101,…) (200,…) (500,…) 0 storage structure Index on since Relation Person Block#= ssn mod 10 1

  8. Cost Model for Our Analysis We ignore CPU costs, for simplicity: • B: The number of data pages • R: Number of records per page • D: (Average) time to read or write disk page • Measuring number of page I/O’s ignores gains of pre-fetching blocks of pages; thus, even I/O cost is only approximated. • Average-case analysis; based on several simplistic assumptions. • Count read and writes as a single access each • Good enough to show the overall trends!

  9. Assumptions in Our Analysis • Single record insert and delete. • Heap Files: • Equality selection on key; exactly one match. • Insert always at end of file. • Sorted Files: • Files compacted after deletions. • Selections on sort field(s). • Hashed Files: • No overflow buckets, 80% page occupancy.

  10. Cost of Operations • Several assumptions underlie these (rough) estimates!

More Related