1 / 163

CS157B - Fall 2004

CS157B - Fall 2004. Entity-Relationship Data Model. Entity Sets a collection of similar entities Attributes properties of the entities in an entity set Relationships connection among two or more entity sets Constraints Keys : uniquely identify an entity within its entity set

anne-cash
Télécharger la présentation

CS157B - Fall 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS157B - Fall 2004

  2. Entity-Relationship Data Model • Entity Sets • a collection of similar entities • Attributes • properties of the entities in an entity set • Relationships • connection among two or more entity sets • Constraints • Keys : uniquely identify an entity within its entity set • Single-value constraints : requirement that the value in a certain context be unique • Referential integrity constraints : existence test • Domain constraints : range test • General constraints : data set constraints

  3. E/R - Cont. • Constraints Are of the Schema • Should be part of the DB design • Must be carefully constructed • Can be troublesome if major redesign is required • Subclasses • Special "isa" relationship • Similar to OO's definition • Weak Entity Sets • Some of its key may be composed of attributes which belong to another entity set • Need additional qualifier

  4. The Relational Data Model • Relation • Represented by a two-demensional table • Attribute • Represented as columns in a table • Schema • The name of a relation and the set of attributes of this relation • A database schema consists of one of more relation schemas • Tuple • A row in the table • Domain • Each attribute of a relation is an elementary type, thus each tuple must be atomic

  5. Relational - Cont • Converting Subclass Structures to Relations • E/R-Style Conversion • Object-Oriented Approach • Null Values • Functional Dependencies (FD) • An unique-value constraint in a relational schema design • Keys of Relations • {A(1), ..., A(n)} is a key of relation R if • no distinct tuples agree on all of the {...} • no other subset of {} determines all other attributes of R, must be minimal • If more than one key, then designate one of the keys as the primary key • Superkey : Composed by a set of attributes

  6. Relational - Cont • Design of Relational Database Schemas • Anomalies • Redundancy : Many repeated information in each tuple • Update Anomalies : One tuple update does not cause other tuples to be updated • Deletion Anomalies : Lose other information as a side effect when a set of values becomes empty • Decomposing Relations • Splitting the attributes of R to make the schemas of two new relations • Boyce-Codd Normal Form (BCNF) • The left side of every nontrivial FD must be a superkey • Recursively decompose into smaller sets of tables until they all satisfy the BCNF rule • Must be able to join all set of tables into the original tuple

  7. Relational - Cont • Third Nomal Form (3NF) • A relation R is in 3NF if: whenever A(1)...A(n) ->B is a nontrivial FD, either {A(1)...A(n)} is a superkey, or B is a member of some key • May allow minimal redundancy in the end • Multivalued Dependencies (MVD) • A statement that two sets of attributes in a relation have sets of values that appear in all possible combinations • Example of MVD in p.118 figure 3.29 • No BCNF violation • Fourth Normal Form (4NF) • A relation R is in 4NF if whenever A(a)..A(n)->->B(1)..B(m) is a nontrivial MVD, {A(1)...A(n)} is a superkey • Can be decomposed similar to the BCNF decomposition algorithm

  8. Other Data Models • Review of Object-Oriented Concepts • Type System • Record structures • Collection types • Reference types • Classes and Objects • Contain attributes and methods • Object Identity • OID • Methods • Functions within a class • Hierarchies • Superclass and subclass relationship • Introduction to ODL • Object-Oriented Design • Three main properties: attributes, relationships, and methods

  9. Other Data Models - Cont • Class Declarations • e.g. class <name> { <list of properties> } • Attributes in ODL • can be simple or complex type • Relationships in ODL • e.g. in Movie: relationship Set<Star> stars; • Inverse Relationships • e.g. in Star: relationship Set<Movie> starredIn; • Methods in ODL • signatures • Types in ODL • Atomic types: • integer • float • character • character string • boolean • enumerations

  10. Other Data Models - Cont • Structure types • e.g. class name • Structures - similar to a tuple of values • Collection types • Set - distinct unordered elements • Bag - unordered elements • List - ordered elements • Array - e.g. Array<char,10> • Dictionary - e.g. Dictionary(Keytype, Rangetype) • Additional ODL Concepts • Multiway Relationships in ODL • ODL supports only binary relationships • Use several binary, many-one relationship instead • Subclasses in ODL • Inherits all the properties of its superclass

  11. Other Data Models - Cont • Multiple Inheritance in ODL • Needs to resolve name conflicts from multiple superclass • Keys in ODL • Optional because of the existence of an OID • Can be declared with one or more attributes • ODL Designs to Relational Designs • Many issues • Possibly no key in ODL • Relational can't handle structure and collection types directly • Convert any ODL type constructor can lead to a BCNF violation • Representing ODL Relationships • one relation for each inverse pairs

  12. Other Data Models - Cont • Object-Relational Model • O/R Features • Structured types for attributes • Methods • Identifiers for tuples (like OID) • References • Compromise Between Pure OO and Relational • Semistructured Data • "Schemaless" - the schema is attached to the data itself • Represented in a collection of nodes • Interior node describes the data with a label • Leaf node contains the data of any atomic type • Legacy-database problem

  13. Relational Algebra • Algebra of Relational Operations • Bags rather than sets can be more efficient depending on the operation such as an union of two relations which contain duplicates • Components • Variables that stand for relations and Constants which are finite relations • Expressions of relational algebra are referred to as queries • Set Operations on Relations • R u S, the union of R and S (distinct) • R n S, the intersection of R and S (distinct) • R - S, the difference of R and S, is the set of elements that are in R but not in S; different from S - R • Tuples must be in the same order and attribute types must be the same

  14. Relational Algebra - Cont • Projection • The projection operator is used to produce from a relation R a new relation that has only some of R's columns • e.g. Views • Selection • The selection operator produces a new relation R with a subset of R's tuples; the tuples in the resulting relation are those that satisfy some condition C that involves the attributes of R • e.g. SQL select statement • Cartesian Product • A cross-product of two sets R and S • e.g. R x S ; if R has 2 tuples and S has 3 tuples, the result has 6 tuples • Natural Joins • Use matching attributes

  15. Relational Algebra - Cont • Theta-Joins • Join two relations with a condition denoted by 'C' • Combining Operations to Form Queries • Multiple queries can be combined into a complex query • e.g. AND, OR, (), ... • Renaming • Control the names of the attributes used for relations that are constructed by applying relational-algebra operations • Dependent and Independent Operations • Some relational expression can be "rewritten" in a different expression • e.g. R n S = R - (R - S) • e.g. Glass half-empty or half-full?

  16. Relational Algebra - Cont • Extended Operators of Relational Algebra • Duplicate Elimination • This operator converts a bag to a set • e.g. SQL keyword DISTINCT • Aggregation Operators • SUM(), AVG(), MIN(), MAX(), COUNT() • Grouping • This operator allows us to group a relation and/or aggregate some columns • Extended Projection • This allows expressions as part of an attribute list • Sorting Operator • This operator is anomalous, in that it is the only operator in our relational algrebra whose result is a list of tuples, rather than a set • Outerjoin • Takes care of the dangling tuples; denote with special "null" symbol

  17. SQL • Simple Queries in SQL • select A(...) from R where C • Projection & Selection in SQL • select title, length from Movie where studioName = 'Disney' and year = 1990 • String Comparison • Bit Strings (bit data) • e.g. B'011' or X'7ff' • Lexicographic order • e.g. 'A' < 'B'; 'a' < 'b'; 'a' > 'B'? • Depends on encoding scheme or standard • (e.g. ASCII, UTF8) • LIKE keyword • "s like p" denotes where s is a string and p is a pattern • Special character % • Dates and Times • DATE '2002-02-04' • TIME '19:00:30.5'

  18. SQL - Cont • Null Values • 3 cases: unknown, inapplicable, withheld • NOT a constant • Test expression for IS NULL • Truth-Value "unknown" • Pitfalls regarding nulls • e.g. select * from Movie • where length <= 120 or length > 120; • Ordering the Output • ORDER BY <list of attributes> • Can be ASC or DESC, default is ASC • Queries with > 1 Relation • e.g. select name from Movie, MovieExec where title = 'Star Wars' and producerC# = cert# • Disambiguating Attributes • select MovieStar.name, MovieExec.name from MovieStar, MovieExec where MovieStar.address = MovieExec.address;

  19. SQL - Cont • UNION, INTERSECT, EXCEPT Keywords • Same logic as the set operators of u, n, and - • Subqueries • Can return a single constant or relations in the WHERE clause • Can have relations appear in the FROM clauses • Scalar Value • An atomic value that can appear as one component of a tuple (e.g. constant, attribute) • e.g. select name from MovieExec where cert#=(select producerC# from Movie where title='Star Wars'); • Conditions Involving Relations • If R is a relation, then EXISTS R is a condition that is true if R is not empty

  20. SQL - Cont • s IN R is true if s is equal to one of the values in R; s NOT IN R is true if s is equal to no value in R • s > ALL R is true if s is greater than every value in unary relation R • s > ANY R is true if s is greater than at least one value in unary relation R • EXISTS, ALL, ANY operators can be negated by putting NOT in front of the entire expression • Subqueries in FROM Clauses • Can substitute a R in the FROM clause with a subquery • e.g. select name from MovieExec, (select producerC# from Movie, StarsIn where title = movieTitle and year = movieYear and starname = 'Harrison Ford'

  21. SQL - Cont • Cross Joins • Known as Cartesian product or just product • e.g. Movie CROSS JOIN StarsIn; • Natural Joins • The join condition is that all pairs of attributes from the two relations having a common name are equated, and no other conditions • One of each pair of equated attributes is projected out • Outerjoins • e.g. MovieStar NATURAL FULL OUTER JOIN MovieExec; • Full-Relation Operations • Eliminating Duplicates • Use the DISTINCT keyword in SELECT • Performance consideration

  22. SQL - Cont • Duplicates in U, I, and D • By default, UID operations convert bags to sets • Use keyword ALL after UNION, INTERSECT EXCEPT keywords to prevent the elimination of duplicates • Grouping and Aggregation in SQL • Use the special GROUP BY clause • Aggregation Operators • SUM, AVG, MIN, MAX are used by applying them to a scalar-valued expression, typically a column name, in a SELECT clause • COUNT(*) is used to counts all the tuples in R that is constructed from the FROM clause and WHERE clause of the query • HAVING Clause • Use in conjunction with GROUP BY to narrow the aggregated list

  23. SQL - Cont • Database Modifications (DML) • Insertion • Insert into R(A1...An) values (V1,...Vn); • e.g. insert into Studio(name) values('S1'); • Can insert multiple tuples with subquery • e.g. insert into Studio(name) select distinct studioName from Movie where studioName not in (select name from studio); • Deletion • Delete from R where <condition>; • Can delete multiple tuples with 1 delete statement depending on <condition> • Updates • Update R set <new-value assignment> where <condition>; • Can update multiple tuples with 1 update statement depending on <condition>

  24. SQL - Cont • DDL in SQL • Data Types • Character strings, fixed or variable length • Bit strings, fixed or variable length • Boolean: true, false, unknown • Integer or int; shortint • Floating-point numbers • e.g. decimal(n,d) where n is total number of digits with d is the decimal point from the right; • 1234.56 can be described as decimal(6,2) • Dates and times can be represented by the data types DATE and TIME respectively • Table Declarations • Use the keywords CREATE TABLE followed by the R name and list of As and their types • e.g. create table MovieStar(name char(30), address varchar(256), gender char(1), birthday DATE);

  25. SQL - Cont • Modifying Relation Schemas • Drop table MovieStar; • Alter table MovieStar add phone char(16); • Alter table MovieStar drop birthdate; • Default Values • Use the DEFAULT keyword to set default values for a column • e.g. alter table MovieStar add phone char(16) default 'unlisted'; • Indexes • Allow faster access to data • e.g. create index YearIndex on Movie(year); • Can be one or more attributes • e.g. create index KeyIndex on Movie(title,year); • Delete the index using drop index statement • Selection of Indexes • Selection vs IUD performance

  26. SQL - Cont • View Definitions • View does not contains any physical data • "virtual relation" • Declaring Views • Create view <view-name> as <view-definition> • Querying Views • Use the normal select syntax with a view name in place of the table name • Renaming Attributes • Can map table attribute name from the base table to a new name in a view definition • Modifying Views • Updatable views are useful in special cases; selective IUDs

  27. Constraints & Triggers • Keys & Foreign Keys • Primary Keys • Each relation can have only one primary key • Primary key attribute(s) can not be NULL • Two ways to specify the primary key • 1) create table MovieStar( • name char(30) primary key, • address varchar(255), • gender char(1), • birthdate date); • 2) create table MovieStar( • name char(30), • address varchar(255), • gender char(1), • birthdate date, • primary key(name, birthday)); • Unique Keys • Each relation can have >1 unique keys • Declared the same way as primary key • NULL is allowed

  28. Constraints & Triggers - Cont • Enforcing Key Constraints • During insertion or update to the relation • Foreign-Key • The "referenced" attribute(s) must be declared unique or the primary key for their relation; it must not have a NULL value • create table Studio( • name char(30) primary key, • address varchar(255), • presC# int references MovieExec(cert#) • ); • create table Studio( • name char(30) primary key, • address varchar(255), • presC# int, • foreign key (presC#) references MovieExec(cert#) • );

  29. Constraints & Triggers - Cont • Maintaining Referential Integrity • Reject Violating Modifications (Default) • Insert or update Studio tuple whose presC# value is not NULL and is not the cert# component of any MovieExec tuple • Delete a MovieExec tuple and its cert# component appears as the presC# component of one or more Studio tuples • Update a MovieExec tuple cert# value; but the old cert# is the value of presC# of some movie studio in Studio • Cascade Policy • When deleting the MovieExec tuple for the president of a studio, then it will delete the referencing tuple from Studio • By changing the cert# value for a MovieExec tuple from c1 to c2 and there was some Studio tuple with c1 as the value of its presC# component, then it will update this presC# component to have the value c2

  30. Constraints & Triggers - Cont • Set-Null Policy • Can handle the delete and update problem by setting the presC# to NULL • e.g. create table Studio ( • name char(30) primary key, • address varchar(255), • presC# int references MovieExec(cert#) • on delete set null • on update cascade • ); • Deferring Checking of Constraints • Do selective insert to default the presC# to null • Insert tuple into MovieExec with new cert# • Update the Studio tuple with matching presC# • Use keyword DEFERRABLE and DEFERRED to delay the checking until the whole tranaction is "committed" • Reverse the DEFERRED case with keyword IMMEDIATE

  31. Constraints & Triggers - Cont • Constraints on Attributes and Tuples • Not-Null Constraints • Use the NOT NULL keywords in create table statement for any attribute • Attribute-Based Constraints • Use the CHECK keyword in create table statement • Limit the value for an attribute • e.g. gender char(1) check (gender in ('F','M')) • Tuple-Based Constraints • Use the CHECK keyword in create table statement • Can compose of complex expression of multiple attributes • Constraints Modification • Naming Constraints • In order to change, it must have a name • Use the CONSTRAINT keyword

  32. Constraints & Triggers - Cont • Altering Constraints on Tables • Can use ALTER TABLE to add or drop a constraint • Can use SET CONSTRAINTS to set it for deferred or immediate • Schema-Level Constraints and Triggers • Assertions (General Constraint) • A boolean-valued SQL expression that must be true at all times • create assertion <name> check (<condition>) • e.g. • create assertion RichPres check (not exists (select * from Studio, MovieExec where presC# = cert# AND netWorth < 10000000)); • Event-Condition-Action Rules (ECA Rules) • Triggers are awakened by certain events • The "action" will be preform only if C = true

  33. Constraints & Triggers - Cont • Triggers in SQL • create trigger NetWorthTrigger • after update of netWorth ON MovieExec • referencing • old row as OTuple, • new row as NTuple • for each row • when (OTuple.netWorth > NTuple.netWorth) • update MovieExec • set netWorth = OTuple.netWorth • where cert# = NTuple.cert#; • Default is "for each statement" • Besides update, can use insert and delete • Action can be "before" or "after" the event • Use BEGIN...END for multiple statements • Instead-Of Triggers • Not part of SQL-99 • Replace event with new defined operations • Very powerful when used on a view

  34. System Aspects of SQL • SQL Programming Environment • Host language + Embedded SQL • v • Preprocessor • v • Host language + Function calls • v • Host-language compiler <= SQL Library • v • Object-code program • Impedance Mismatch Problem • Different data model between SQL statements and programming langauges • SQL/Host Language Interface • Use EXEC SQL keywords in front of an SQL statement • Use shared (host) variables for SQL stmt • Check SQLSTATE for SQL errors

  35. System Aspects of SQL - Cont • The DECLARE Section and Its Usage • Shared variables are declared between two embedded SQL statements. • e.g. • EXEC SQL BEGIN DECLARE SECTION; • char studioName[50], studioAddr[256]; • char SQLSTATE[6]; • EXEC SQL END DECLARE SECTION; • A shared variable can be used within the SQL statement by placing a colon in front it. • e.g. • EXEC SQL INSERT INTO • Studio(name, address) • VALUES (:studioName, :studioAddr); • Single-Row Select Statement • e.g. • EXEC SQL SELECT netWorth • INTO :presNetWorth • FROM Studio, MovieExec • WHERE presC# = cert# AND • Studio.name = :studioName;

  36. System Aspects of SQL - Cont • Cursors • Allow programs to "fetch" multiple rows from a relation • Here are the steps for using a cursor • EXEC SQL DECLARE <cursor> CURSOR FOR <query> • EXEC SQL OPEN <cursor> • EXEC SQL FETCH FROM <cursor> INTO <list-of-variables> • If SQLSTATE is "02000", then goto close <cursor>; otherwise fetch next row • EXEC SQL CLOSE <cursor> • Row Modification with Cursor • Use the WHERE CURRENT OF keywords • e.g. • EXEC SQL DELETE FROM MovieExec • WHERE CURRENT OF execCursor; • EXEC SQL UPDATE MovieExec • SET netWorth = 2 * netWorth • WHERE CURRENT OF execCursor;

  37. System Aspects of SQL - Cont • Concurrent Update of Tuple • Use keywords INSENSITIVE CURSOR to ignore new changes which may affect the current cursor • Use Keywords FOR READ ONLY to signal that this cursor does not allow any modification • Scrollable Cursors • Allow a set of movements within a cursor • Dynamic SQL • Flexibility to enter SQL statement at run time • Use EXEC SQL EXECUTE IMMEDIATE or • ( EXEC SQL PREPARE ... and • EXEC SQL EXECUTE ... ) • e.g. • EXEC SQL BEGIN DECLARE SECTION; • char *query; • EXEC SQL END DECLARE SECTION; • /* Allocate memory pointed to by query • and fill in the SQL statement */ • EXEC SQL EXECUTE IMMEDIATE :query;

  38. System Aspects of SQL - Cont • Procedures Stored in the Schema • Persistent Stored Modules (PSM) • Can build module to handle complex computations which cannot be expressed using SQL • PSM Functions & Procedures • CREATE PROCEDURE <name> (<param>) • local declarations • procedure body; • Procedure parameter can be input-only, output-only, or both • CREATE FUNCTION <name> (<param>) RETURNS <type> • local declarations • function body; • Function parameter can only be input as PSM forbids side-effects in functions

  39. System Aspects of SQL - Cont • Statements in PSM • Call statement • CALL <proc name> (<arg list>); • e.g. EXEC SQL CALL Foo(:x, 3); • RETURN <expression>; • DECLARE <name> <type>; • SET <variable> = <expression>; • BEGIN ... END • IF <condition> THEN • <statement list> • ELSEIF <condition> THEN • <statement list> • ELSEIF • ... • ELSE <statement list> • END IF; • SELECT <attr> INTO <var> FROM <table> • WHERE <condition> • LOOP <statement list> END LOOP;

  40. System Aspects of SQL - Cont • FOR <loop name> AS <cursor name> CURSOR FOR • <query> • DO • <statement list> • END FOR; • Support WHILE and REPEAT loops • Exception Handler in PSM • DECLARE <where to go> HANDLER FOR <condition list> <statement> • <where to go> can be: • CONTINUE - executing the handler statement and then execute the next statement after the one which cause the exception • EXIT - execute the handler statement and then control leaves the BEGIN...END block in which the handler is declared • UNDO - same as EXIT except that any changes to the DB or local variables that were made by the statements of the block are "undone"

  41. System Aspects of SQL - Cont • SQL Environment • Schema • A collection of tables, views, assertions, triggers, PSM modules, etc • CREATE SCHEMA <name> <declarations> • Use SET SCHEMA to change schema name • Catalog • A collection of schemas • CREATE CATALOG <catalog name> • Use SET CATALOG to change the current catalog • Cluster • A collection of catalogs • Can be view as a set of all catalogs accessible to a user • Client/Server • Both client and server can be on the different or the same machine

  42. System Aspects of SQL - Cont • Connection • CONNECT TO <server name> AS <connection name> AUTHORIZATION <name and password> • SET CONNECTION <name>; • DISCONNECT <name>; • Call-Level Interface (CLI) • In C, each CLI program must include sqlcli.h where it contains all the function, structure, constant, and type definitions • 4 kinds of records: SQLHENV, SQLHDBC, SQLHSTMT, and SQLHDESC. • Use SQLAllocHandle(hType, hIn, hOut) • Processing Statements • Use SQLPrepare(sh, st, sl); & SQLExecute(sh); • or use SQLExecDirect(sh, st, sl); • Use SQLFetch(sh) from a query result

  43. System Aspects of SQL - Cont • Use SQLBindCol(sh, colNo, colType, pVar, varSize, varInfo) for column binding • Can use SQLGetData(...) in place of SQLBindCol(...) to extract data from a query • Passing Parameters to Query • e.g. • SQLPrepare(myStmt, "INSERT INTO Studio(name, address) VALUES (?, ?)", SQL_NTS); • SQLBindParameter(myStmt, 1, ..., studioName, ...); • SQLBindParameter(myStmt, 2, ..., studioAddr, ...); • SQLExecute(myStmt); • Transactions in SQL • Serializability • Multiple selects followed by multiple updates to the same tuple; e.g. chooseSeat() • Use locks to handle this problem

  44. System Aspects of SQL - Cont • Atomicity • Single user transaction may have multiple updates to different tables; e.g. transfer from account A to account B • Only "commit" after all the changes are made • Transaction • A collection of one or more operations on the database that must be executed atomically • Use START TRANSACTION to begin • Use SQL COMMIT to commit • Use SQL ROLLBACK to abort and undo the changes prior to the start of the transaction • Can set the transaction to READ ONLY • Dirty Reads (Uncommitted Reads) • Data read that were "dirty" or uncommitted • Isolation Levels • Serializable, uncommitted read, committed read, and repeatable-read

  45. System Aspects of SQL - Cont • Security & User Authorization in SQL • Privileges • SQL defines nine types of privileges: SELECT, INSERT, DELETE, UPDATE, REFERENCES, USAGE, TRIGGER, EXECUTE, and UNDER • Authorization Checking • First at connect time • Second at statement time • Additional checks with modules • Grant & Revoke • GRANT <privilege list> ON <database element> TO <user list> • Allow other user to perform certain actions • REVOKE <privilege list> ON <database element> FROM <user list> • Disallow a previously granted privilege

  46. Data Storage • Megatron 2002 Database System • Store relation in ASCII text file • Store the schema also in ASCII file • Obvious problems: • Tuple layout on disk is not flexible; any small change may shuffle the whole file • Searching is expensive; must read the whole file • Query-processing is by brute force; nested loop to examine all possibilities • No memory buffering, every query requires direct access to disk • No concurrency control • No reliability; e.g. no crash recovery • The Memory Hierarchy • Cache Memory • Fast access to and from processor or I/O controller

  47. Data Storage - Cont • Main Memory • Random access (RAM) • Both OS and applications reside in RAM • Virtual Memory • Allows each application to have their own private memory space which mapped to physical memory (RAM) or disk memory • A page is a memory block used by main memory to/from disk • Secondary Storage • Much slower than main memory • Two type of disk I/O • Disk read means moving a block from disk to main memory • Disk write means moving a block from main memory to disk • Most DBMS will manage disk blocks itself, rather than relying on the OS file manager

  48. Data Storage - Cont • Volatile and Nonvolatile Storage • Main memory is typically volatile; thus when the power is off, the content is gone • Flash memory are nonvolatile but it is very expensive and currently not used in main memory • An alternative is to use "RAM disk" combine with a battery backup to the power supply • Disks • Disk Components • Head, platter (2 surfaces each), cylinder, tracks, sectors, gap • Disk Controller • Controls the movement of the disk head(s) to a specific track and preforms reads and writes • Tranfers data to and from main memory

  49. Data Storage - Cont • Effective Use of Secondary Storage • CS studies of algorithm often assumes that the data are always in main memory; this is not a valid assumption for DBMS • I/O Model of Computation • Dominance of I/O cost • If a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time for manipulating that data in main memory; thus the I/O time is a good approximation of the total time • Similar to Big O notation for algorithm study • Sorting Data in Secondary Storage • If we need to sort 1.64 billion bytes and a disk block is configured to handle 16384 bytes, then 100000 blocks are required to read each tuple once from disk • Quicksort is one of the fastest algorithm but its assumption is all entries are in memory

  50. Data Storage - Cont • Two-Phase, Multiway Merge-Sort (TPMMS) • Consists of 2 phases • Phase 1: Sort main-memory-sized pieces of the data, so every record is part of a sorted list that just fits in the availabe main memory; the results are a set of sorted sublists on disk which we merge in the next phase • Phase 2: Merge all the sorted sublists into a single sorted list • Example: • If we have 100 MB of main memory using 16384 size block sorting 1.64 billion bytes, we can fit 6400 blocks at a time in main memory; thus the results from phase 1 will have 16 sorted sublists • If merge two sublists at a time, we need 8 disk I/O's performed on it • The better approach is to read the first block from each of the sorted list into main-memory buffer. Find smallest element into a output buffer and flush/reload when necessary.

More Related