1 / 50

Data Storage and Access Methods

Data Storage and Access Methods. Min Song IS698. Database Design Process. Application 1. Application 2. Application 3. Application 4. External Model. External Model. External Model. External Model. Application 1. Conceptual requirements. Application 2. Conceptual

Antony
Télécharger la présentation

Data Storage and Access Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Storage and Access Methods Min Song IS698

  2. Database Design Process Application 1 Application 2 Application 3 Application 4 External Model External Model External Model External Model Application 1 Conceptual requirements Application 2 Conceptual requirements Conceptual Model Logical Model Internal Model Application 3 Conceptual requirements Application 4 Conceptual requirements PhysicalDesign

  3. Physical Database Design • Many physical database design decisions are implicit in the technology adopted • Also, organizations may have standards or an “information architecture” that specifies operating systems, DBMS, and data access languages -- thus constraining the range of possible physical implementations. • We will be concerned with some of the possible physical implementation issues

  4. Physical Database Design • The primary goal of physical database design is data processing efficiency • We will concentrate on choices often available to optimize performance of database services • Physical Database Design requires information gathered during earlier stages of the design process

  5. Physical Design Information • Information needed for physical file and database design includes: • Normalized relations plus size estimates for them • Definitions of each attribute • Descriptions of where and when data are used • entered, retrieved, deleted, updated, and how often • Expectations and requirements for response time, and data security, backup, recovery, retention and integrity • Descriptions of the technologies used to implement the database

  6. Physical Design Decisions • There are several critical decisions that will affect the integrity and performance of the system • Storage Format • Physical record composition • Data arrangement • Indexes • Query optimization and performance tuning

  7. Storage Format • Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database • Data Type (format) is chosen to minimize storage space and maximize data integrity

  8. Objectives of data type selection • Minimize storage space • Represent all possible values • Improve data integrity • Support all data manipulations • The correct data type should, in minimal space, represent every possible value (but eliminate illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

  9. Access Data Types • Numeric (1, 2, 4, 8 bytes, fixed or float) • Text (255 max) • Memo (64000 max) • Date/Time (8 bytes) • Currency (8 bytes, 15 digits + 4 digits decimal) • Autonumber (4 bytes) • Yes/No (1 bit) • OLE (limited only by disk space) • Hyperlinks (up to 64000 chars)

  10. Access Numeric types • Byte • Stores numbers from 0 to 255 (no fractions). 1 byte • Integer • Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes • Long Integer (Default) • Stores numbers from –2,147,483,648 to 2,147,483,647 (no fractions). 4 bytes • Single • Stores numbers from -3.402823E38 to –1.401298E–45 for negative values and from 1.401298E–45 to 3.402823E38 for positive values. 4 bytes • Double • Stores numbers from –1.79769313486231E308 to –4.94065645841247E–324 for negative values and from 1.79769313486231E308 to 4.94065645841247E–324 for positive values. 15 8 bytes • Replication ID • Globally unique identifier (GUID) N/A 16 bytes

  11. Designing Physical Records • A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit • Fixed Length and variable fields

  12. Data Storage • Storing Data: Disks • Buffer manager • Representing relational data in a disk

  13. The Memory Hierarchy Main Memory = Disk Cache • Processor Cache: • access time 10 nano’s • 512K • Volatile • 256M-1G • Access time: • 10-100 nanoseconds Disk Tape • Persistent • 10-100 GB storage • speed: • Rate=5-10 MB/S • Access time= • 10-15 msecs. • 1.5 MB/S transfer rate • 280 GB typical • capacity • Only sequential access • Not for operational • data

  14. Main Memory • Fastest, most expensive (excluding cache) • Today: 512MB are common even on PCs • Many databases could fit in memory • New industry trend: Main Memory Database • E.g TimesTen • Main issue is volatility

  15. Secondary Storage • Disks • Slower, cheaper than main memory • Persistent !!! • The unit of disk I/O = block • Typically 1 block = 4k • A disk block is also called a disk page or simply a page • Used with a main memory buffer

  16. Block • Blocking factor (bfr) for a file is the average number of records stored in a disk block. • Suppose the block size of a database system is 2000 bytes. Customer table has an average record length of 190 bytes. Assume the overhead of a block for the data is 100 bytes. • What is the blocking factor?

  17. Tracks Arm movement Arm assembly The Mechanics of Disk Cylinder Mechanical characteristics: • Rotation speed (5400RPM) • Number of platters (1-30) • Number of tracks (<=10000) • Number of sectors (256/track) • Number of bytes / sector (29=512) • Block size (212=4096) Spindle Disk head Sector Platters

  18. Important Disk Access Characteristics • Block access time = Disk latency + transfer time • Disk latency = seek time + rotational latency • Seek time = time for the head to reach the right track • 10ms – 40ms • Rotational latency = rotation time to get to the right sector • Time for one rotation = 10ms • Average rotation latency = 10ms/2 • Transfer time = typically 5-10MB/s • Disks read/write one block at a time (typically 4kB)

  19. Representing Data Elements • Relational database elements: CREATE TABLE Product ( pid INT PRIMARY KEY, name CHAR(20), description VARCHAR(200), maker CHAR(10) REFERENCES Company(name)) • A tuple is represented as a record

  20. Record Formats: Fixed Length F3 F4 F1 F2 • Information about field types same for all records in a file; stored in systemcatalogs. • Finding i’th field requires scan of record. • Note the importance of schema information! L3 L4 L1 L2 Address = B+L1+L2 Base address (B)

  21. Record Header To schema length F3 F4 F1 F2 L3 L4 L1 L2 header timestamp • Need the header because: • The schema may change • for a while new+old may coexist • Records from different relations may coexist

  22. Variable Length Records Other header information header F3 F4 F1 F2 L3 L4 L1 L2 length Place the fixed fields first: F1, F2 Then the variable length fields: F3, F4 Null values take 2 bytes only Sometimes they take 0 bytes (when at the end)

  23. Records With Referencing Fields Other header information header F3 F1 F2 L3 L1 L2 length E.g. to represent one-many or many-many relationships

  24. Storing Records in Blocks • Blocks have fixed size (typically 4k) BLOCK R4 R3 R2 R1

  25. Spanning Records Across Blocks block header block header • When records are very large • Or even medium size: saves space in blocks R1 R2 R3 R2

  26. BLOB • Binary large objects • Supported by modern database systems • E.g. images, sounds, etc. • Storage: attempt to cluster blocks together

  27. Modifications: Insertion • File is unsorted • add it to the end • File is sorted: • Is there space in the right block ? • Yes: we are lucky, store it there • Is there space in a neighboring block ? • Look 1-2 blocks to the left/right, shift records • If anything else fails, create overflow block

  28. Overflow Blocks Blockn-1 Blockn Blockn+1 • After a while the file starts being dominated by overflow blocks: time to reorganize Overflow

  29. Modifications: Deletions • Free space in block, shift records • Maybe be able to eliminate an overflow block

  30. Modifications: Updates • If new record is shorter than previous, easy  • If it is longer, need to shift records, create overflow blocks

  31. Physical Addresses • Each block and each record have a physical address that consists of: • The host • The disk • The cylinder number • The track number • The block within the track • For records: an offset in the block • sometimes this is in the block’s header

  32. Logical Addresses • Logical address: a string of bytes (10-16) • More flexible: can blocks/records around • But need translation table:

  33. Main Memory Address • When the block is read in main memory, it receives a main memory address • Buffer manager has another translation table

  34. Designing Physical/Internal Model • Overview • terminology • Access methods

  35. DBMS External Model Internal Model Access Methods Physical Design • Internal Model/Physical Model User request Interface 1 Interface 2 Operating System Access Methods Interface 3 Data Base

  36. Physical Design • Interface 1: User request to the DBMS. The user presents a query, the DBMS determines which physical DBs are needed to resolve the query • Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database. • Interface 3: The internal model access methods and OS access methods access the physical records of the database.

  37. Physical File Design • A Physical file is a portion of secondary storage (disk space) allocated for the purpose of storing physical records • Pointers - a field of data that can be used to locate a related field or record of data • Access Methods - An operating system algorithm for storing and locating data in secondary storage • Pages - The amount of data read or written in one disk input or output operation

  38. Internal Model Access Methods • Many types of access methods: • Physical Sequential • Indexed Sequential • Indexed Random • Inverted • Direct • Hashed • Differences in • Access Efficiency • Storage Efficiency

  39. Physical Sequential • Key values of the physical records are in logical sequence • Main use is for “dump” and “restore” • Access method may be used for storage as well as retrieval • Storage Efficiency is near 100% • Access Efficiency is poor (unless fixed size physical records)

  40. Indexed Sequential • Key values of the physical records are in logical sequence • Access method may be used for storage and retrieval • Index of key values is maintained with entries for the highest key values per block(s) • Access Efficiency depends on the levels of index, storage allocated for index, number of database records, and amount of overflow • Storage Efficiency depends on size of index and volatility of database

  41. Data File Block 1 Block 2 Block 3 Adams Becker Dumpling Actual Value Address Block Number Dumpling Harty Texaci ... 1 2 3 … Getta Harty Mobile Sunoci Texaci Index Sequential

  42. 001 003 . . 150 251 . . 385 Key Value Key Value Key Value Address Address Address Key Value Address 455 480 . . 536 785 805 536 678 150 385 3 4 5 6 1 2 385 678 805 7 8 9 … 605 610 . . 678 705 710 . . 785 791 . . 805 Indexed Sequential: Two Levels

  43. Indexed Random • Key values of the physical records are not necessarily in logical sequence • Index may be stored and accessed with Indexed Sequential Access Method • Index has an entry for every data base record. These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence. • Access method may be used for storage and retrieval

  44. Becker Harty Actual Value Address Block Number Adams Becker Dumpling Getta Harty 2 1 3 2 1 Adams Getta Dumpling Indexed Random

  45. F | | P | | Z | B | | D | | F | H | | L | | P | R | | S | | Z | Devils Hawkeyes Hoosiers Minors Panthers Seminoles Aces Boilers Cars Flyers Btree

  46. Inverted • Key values of the physical records are not necessarily in logical sequence • Access Method is better used for retrieval • An index for every field to be inverted may be built • Access efficiency depends on number of database records, levels of index, and storage allocated for index

  47. Student name Course Number CH 145 101, 103,104 CH145 cs201 ch145 ch145 cs623 cs623 Adams Becker Dumpling Getta Harty Mobile Actual Value Address Block Number CH 145 CS 201 CS 623 PH 345 1 2 3 … CS 201 102 CS 623 105, 106 Inverted

  48. Direct • Key values of the physical records are not necessarily in logical sequence • There is a one-to-one correspondence between a record key and the physical address of the record • May be used for storage and retrieval • Access efficiency always 1 • Storage efficiency depends on density of keys • No duplicate keys permitted

  49. Hashing • Key values of the physical records are not necessarily in logical sequence • Many key values may share the same physical address (block) • May be used for storage and retrieval • Access efficiency depends on distribution of keys, algorithm for key transformation and space allocated • Storage efficiency depends on distibution of keys and algorithm used for key transformation

  50. Factor Storage space Sequential retrieval on primary key Random Retr. Multiple Key Retr. Deleting records Adding records Updating records Sequential No wasted space Very fast Impractical Possible but needs a full scan can create wasted space requires rewriting file usually requires rewriting file Hashed more space needed for addition and deletion of records after initial load Impractical Very fast Not possible very easy very easy very easy Comparative Access Methods Indexed No wasted space for data but extra space for index Moderately Fast Moderately Fast Very fast with multiple indexes OK if dynamic OK if dynamic Easy but requires Maintenance of indexes

More Related