400 likes | 534 Vues
CS 540 Database Management Systems. Schema Design and Views. Database Implementation. User Requirements. Schema Design. Physical Storage. Conceptual Design. Schema. Relational Model. Entity Relationship(ER) Model. Files and Indexes. E-R Model to Relational Schema. Person. name.
E N D
CS 540 Database Management Systems Schema Design and Views
Database Implementation User Requirements Schema Design Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
E-R Model to Relational Schema Person name ssn address Address Name SSN 21 Kings St. John 222000111 234 2nd St. John 222000111 31 Kings St. Charles 111222333 2 Harrison St.Charles 111222333
Designing for High Quality Data: Normalization • Translate E-R model to relational schema S • Transform S to another relational schema S’ such that: • S’ contains all the information available in S. • S’ contains minimal amount of redundancy. • S’ does not have incomplete information problem.
Example FDs Address Name SSN SSN Name SSN Address 21 Kings St. John 222000111 234 2nd St. John 222000111 31 Kings St. Charles 111222333 2 Harrison St.Charles 111222333
Normal Forms All attributes are atomic. • ✔ We have at least one key
Boyce-Codd Normal Form • Relation R is in BCNF, if and only if: • For each non-trivial FD A B, A is a super-key of R. • If a set of attributes determines another attribute, it determines all attributes. • Every attribute depends only on super-keys.
Example Address Name SSN SSN Name Is SSN a super-key of Person? No Thus, our relation is not in BCNF. 21 Kings St. John 222000111 234 2nd St. John 222000111 31 Kings St. Charles 111222333 2 Harrison St.Charles 111222333
Decompose Person into BCNF SSN Name 222000111 John SSN Name 111222333 Charles SSN Address 22200011121 Kings St. 222000111234 2nd St. 11122233331 Kings St. 1112223332 Harrison St.
BCNF Removes Update, Insertion, and Deletion Anomalies SSN Name 222000111 John 111222333 Charles SSN Address 22200011121 Kings St. 222000111234 2nd St. 11122233331 Kings St. 1112223332 Harrison St.
BCNF Decomposition • Pick an FD A B that violates the BCNF condition in relation R. • Select the largest possible B. • Decompose R to R1 and R2 • Repeat until there is no BCNF violation left. Other AttributesA A B R1 R2
Example Person (SSN, Name, Street, City, State, Zip) • Violation SSN Name Person (SSN, Name) PersonAddr (SSN, Street, City, State, Zip) • Violation City State Person (SSN, Name) PersonAddr(SSN, Street, City, Zip) Location (City, State)
The Complete Algorithm (R, U) • Compute U+ using FD closure Algorithm. • Find the keys for R using U+. • Pick FD A B in U+ that violates BCNF. • Update B to A+ • Replace R with R1= (A,B ) and R2 = (A, others). • Compute keys and FDs for R1 and R2 from U+. • Repeat the above step for R1 and R2until there is no violation left.
BCNF Decomposition is not Unique • Person (SSN, TaxID, Address) SSN TaxID, TaxIDSSN • Both FDs violate BCNF condition. • Decompose based on SSN TaxID: Person(SSN, TaxID), PersonAddr (SSN, Address) • Decompose based on TaxIDSSN: Person(TaxID, SSN) PersonAddr(TaxID, Address)
Does BCNF satisfy the ideal normalization properties? • Minimizing redundancy in R. ✔ • Eliminating incomplete information in R. ✔ • Preserve the information stored in R: • We must recover the tuples in R from R’s BCNF: lossless decomposition. • We must recover all FDs in R’s BCNF: dependency preserving.
Lossless Decomposition • If (R1, R2) is a decomposition of R, then Join (R1, R2)= Rfor all instances of R, R1, R2. • Example: R( A, B, C): R1 (A, B), R2 (A, C) We get bogus tuples, not a lossless decomposition!
BCNF Decomposition is Lossless • R( A, B, C), A B: R1 (A, B), R2 (A, C) The join does not produce bogus tuples. Check section 3.4.1. for the proof.
Dependency Preserving Decomposition • The same FDs in the original and decomposed relations. • Example: FDs: SSN Name; Address, Name SSN Address Name SSN SSN Name SSN Name SSN Address No FD! This decomposition is not dependency preserving!
What will happen? SSN Name SSN Address The instance complies with all FDs in each relation. The instance violates FD: Address, Name SSN • 222000111 John • 111222333 John • 222000111 21 Kings St. • 111222333 21 Kings St. Address Name SSN 21 Kings St. John 222000111 21 Kings St. John 222000111 21 Kings St. John 111222333 21 Kings St. John 111222333
FD Preservation • Given FD A B, If a decomposition puts A in one relation and B in another one, it is not dependency preserving. • BCNF does not always preserve dependencies. • Example: SSN Name; Address, Name SSN Address Name SSN SSN Name SSN Name SSN Address No FD!
3rd Normal Form • A relation R is in 3rd normal form if for each non-trivial FD AB in R • A is a super-key or • B is a part of a key. • Every attribute depends on a key or is in a key. • Every non-key attribute must depend on a key, the whole key, and nothing but the key ( E. Codd ) .
3rd Normal Form FDs: SSN Name; Address, Name SSN • 3NF but not BCNF. • 3NF is a lossless decomposition and preserves dependencies. Address Name SSN
Minimal Basis • The FD sets U1 and U2 are equivalent if and only if U1+ = U2 +. • U2 is a minimal basis for U1 iff: • U2 and U1 are equivalent. • The FDs in U2 have only one attribute in their right hand side. • If we remove an FD from U2 or an attribute from an FD in U2, U2 will not be equivalent to U1. • It is not necessarily unique. • Check Section 3.2.7 for the algorithm.
3NF Synthesizing Algorithm Input: relation R and set of FDs U. Output: Normalized schema S • Find a minimal basis M for U. • For each FD AB in M, if AB is not covered by any relation in S, add Ri = (AB) to S. • If none of the relations in S contain a super-key for R, add a relation to S whose attributes form a key for R.
3NF versus BCNF • BCNF eliminates more redundancies than 3NF. • Normalization must be dependency preserving, unless there is a strong reason. • Try BCNF, but if it is not dependency preserving use 3NF.
Other Normal Forms • 4th normal form deals with a different type of dependencies. • Important and easy to learn. • 5th normal form and higher. • Rarely used!
Normalization Drawbacks • Normalization improves the data quality, but there are other objectives in database design. • Performance • Normalized schemata require more joins slower running time. • Readability of schema • Normalization may put related attributes in different relations. • Hard to maintain the database • Let’s be design independent. (very cool!)
Relational Database Management User Requirements Views Physical Storage Conceptual Design Schema Relational Model Entity Relationship(ER) Model Files and Indexes
Views • Virtual relations defined based on base relations and other views. CREATE VIEW <name> AS <query>;
Views • Create view Fans(name, addr, phone) that contains information about people who like at least one beer. CREATE VIEW Fans AS SELECT name, addr, phone FROM Likes, Drinkers WHERE Drinkers.name=Likes.drinker;
Views • Using Fans(name, addr, phone), find the phone numbers of people who like beer. SELECT phone FROM Fans; • Using Fans(name, addr, phone), find the phone numbers of Oregonians who like beer. SELECT phone FROM Fans WHERE addr LIKE ‘%,OR,%’;
Views • Using views, we can query databases more easily. • Users do not know all relations in a large schema. • Each user generally works with a subset of database • DBAs create some views for each user. • As opposed to base relations, they are not generally stored in the database. • RDBMS stores only their definitions.
Executing Queries that Contain Views • RDBMS interprets the queries as if their views are real relations. • Then, it finds the definitions of the views and rewrite the queries.
View Updating • We can update views as if they are base relations. • But, the modification must be meaningful!
View Updating • A view of beers and their minimum prices: CREATE VIEW BeerMinPriceAS SELECT beer, MIN(price) AS minPrice FROM Sells GROUP BY beer; UPDATE BeerMinPrice SET minPrice = 3.0; Not meaningful
View Updating CREATE VIEW Oregonians AS SELECT * FROM Drinkers WHERE addr LIKE ‘%,OR,%’ INSERT INTO Oregonians VALUES(‘John’, ‘12 Main St.,Corvallis,OR,97331’, ‘541-656-9990’)
View Updating CREATE VIEW OregoniansAddr AS SELECT addr,phone FROM Drinkers WHERE addr LIKE ‘%,OR,%’; INSERT INTO OregoniansAddr VALUES(‘12 Main St.,Corvallis,OR,97331’, ‘503-656-9990’); Name is NULL
Views • Create view FanAddr(addr, phone) that contains address and phone number of people who like at least one beer. CREATE VIEW FansAddrAS SELECT addr, phone FROM Likes, Drinkers WHERE Drinkers.name=Likes.drinker; INSERT INTO FansAddr VALUES(‘12 Main St.,Corvallis,OR,97331’, ‘541-656-9990’) No value for name and drinker
View Updating Rules • Generally speaking: • The SELECT attribute list must contain all attributes that cannot be set to NULL by default. • The view definition must contain only one relation R. • The WHERE clause must not contain a subquey involving R.