1 / 77

C20.0046: Database Management Systems Lecture #8

C20.0046: Database Management Systems Lecture #8. Matthew P. Johnson Stern School of Business, NYU Spring, 2005. Roadmap. Want to remove redundancy/anomalies Convert to BCNF Find FDs – closure alg Check if each FD A B is ok If A contains a key

harva
Télécharger la présentation

C20.0046: Database Management Systems Lecture #8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C20.0046: Database Management SystemsLecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2005 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  2. Roadmap • Want to remove redundancy/anomalies • Convert to BCNF • Find FDs – closure alg • Check if each FD AB is ok • If A contains a key • If not, decompose into R1(A,B), R2(A,rest) • Because AB, this will be lossless • Could check by joining R1 and R2 • Would get no rows not in original M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  3. Normal Forms • First Normal Form = all attributes are atomic • As opposed to set-valued • Assumed all along • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce Codd Normal Form (BCNF) • Fourth Normal Form (4NF) • Fifth Normal Form (5NF) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  4. Name SSN Mailing-address SSN Phone Michael 123 NY 123 212-111-1111 Hilary 456 DC 123 917-111-1111 Bill 789 Chappaqua 456 202-222-2222 456 914-222-2222 789 914-222-2222 789 212-333-3333 Decomposition example Break the relation into two: • The anomalies are gone • No more redundant data • Easy to for Bill to move • Okay for Bill to lose all phones M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  5. Boyce-Codd Normal Form • Name/phone example is not BCNF: • {ssn,phone} is key • FD: ssn  name,mailing-address holds • Violates BCNF: ssn is not a superkey • Its decomposition is BCNF • Only superkeys  anything else M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  6. BCNF motivation • Two big ideas: • Only a key field can determine other fields • Key values are unique •  no FD-caused redundancy • Slogan: “Every FD must contain the key, the whole key and nothing but the key.” • More accurate: “Every FD must contain (on the left) a key, a whole key, and maybe other fields. M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  7. Design examples • Consider situation: • Entities: Parts, Suppliers, Departments • Relship: Contracts(P,S,D,id,quant) • Draw E/R • New rule: no department can buy multiple parts from the same supplier (why?) • Translate to FD • Normalize M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  8. Design examples • Consider situation: • Entities: Emp(ssn,name,lot), Dept(id,name,budg) • Relship: Works(E,D,since) • Draw E/R • New info: in each dept, everyone parks in same lot • Translate to FD • Normalize M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  9. BCNF Decomposition • Larger example: multiple decompositions • {Title, Year, Studio, President, Pres-Address} • FDs: • Title Year  Studio • Studio  President • President  Pres-Address •  Studio  President, Pres-Address (why?) • No many-many this time • Problem cause: transitive FDs: • Title,year  studio  president M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  10. BCNF Decomposition • Illegal: As  Bs, where As don’t include key • Decompose: Studio  President, Pres-Address • As = {studio} • Bs = {president, pres-address} • Cs = {title, year} • Result: • Studios(studio, president, pres-address) • Movies(studio, title, year) • Is (2) in BCNF? Is in (1) BCNF? • Key: Studio • FD: President  Pres-Address • Q: Does president  studio? If so, president is a key • But if not, it violates BCNF M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  11. BCNF Decomposition • Studios(studio, president, pres-address) • Illegal: As  Bs, where As don’t include key •  Decompose: President  Pres-Address • As = {president} • Bs = {pres-address} • Cs = {studio} • {Studio, President, Pres-Address} becomes • {President, Pres-Address} • {Studio, President} M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  12. Decomposition algorithm example • R(N,O,R,P) F = {N  O, O  R, R  N} • Key: N,P • Violations of BCNF: N  O, OR, N OR • which kinds of violations are these? • Pick N  OR (on board) • Can we rejoin? (on board) • What happens if we pick N  O instead? • Can we rejoin? (on board) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  13. BCNF and two-att relations • Must a two-attribute relation be in BCNF? • Case 1: there are no non-trivial FDs • Case 2: A  B but not B  A • Case 3: B  A but not A  B • Case 4: Both A  B and B  A • Note that relations may have multiple keys • BCNF requires a key on the left, not all keys M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  14. Lossless BCNF decomposition • Consider simple relation: R(A,B,C) • Only FD: A  B (assume C!A) Key: A,C • Also goes through if CA • BCNF violation: no key on the left • Thus: Decomposition to BCNF: • Create R1(A,B) and R2(A,C) • Could this be lossy? • We will join R1 and R2 on A to find out M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  15. Lossless BCNF decomposition • Suppose R contains the rows: (b,a,c) and (b’,a,c’) • In projection onto (B,A): • (b,a,c)  (b,a), (b’,a,c’)  (b’,a) • In projection onto (A,C): • (b,a,c)  (a,c), (b’,a,c’)  (a,c’) • In joining, (b’,a) and (a,c) become (b’,a,c), • and (b,a) and (a,c’) become (b,a,c’) • Q: Is/must/can this be correct? • A: Yes! A  B, so b = b’ • So this was lossless • We assumed C!A, but argument also goes through when CA • Moral: BCNF decomp alg really is lossless M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  16. BCNF summary • BCNF decomposition is lossless • Can reproduce original by joining • Saw: Every 2-attribute relation is in BCNF • Final set of decomposed relations might be different • depends on order of bad FDs chosen • Saw: But all results will be in BCNF M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  17. A problem with BCNF • Relation: R(Title, Theater, Neighboorhood) • FDs: • Title,N’hood  Theater • Assume a movie shouldn’t play twice in same neighborhood • Theater  N’hood • Keys: • {Title, N’hood} • {Theater, Title} M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  18. R1 R2 Theater N’hood Theater Title Angelica Village Angelica Aviator Angelica Life Aquatic A problem with BCNF • BCNF violation: Theater  N’hood • Decompose: • {Theater, N’Hood} • {Theater, Title} • Resulting relations: M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  19. (R’) Theater N’hood Title Angelica Village Life Aquatic Angelica Village Aviator Film Forum Village Life Aquatic Problem - continued • Suppose we add new rows to R1 and R2: • Their join: R1 R2 A and B could not enforce FD Title,N’hood  Theater M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  20. Third normal form: motivation • There are some situations in which • BCNF is not dependency-preserving, and • Efficient checking for FD violation on updates is important • In these cases BCNF is too severe a req. • Solution: define a weaker normal form, called Third Normal Form • in which FDs can be checked on individual relations without performing a join (no inter-relational FDs) • to which relations can be converted, preserving both data and FDs M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  21. Third Normal Form • BCNF decomposition is not dependency-preserving! • We now define the (weaker) Third Normal Form • Turns out: this example was already in 3NF A relation R is in 3rd normal form if : For every nontrivial dependency A1, A2, ..., An Bfor R, {A1, A2, ..., An } is a super-key for R, or B is part of a key, i.e., B is prime Tradeoff: BCNF = no FD anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  22. BCNF: vices and virtues • Be clear on the problem just described v. the arg. that BCNF decomp is data-lossless • BCNF decomp does not lose data • Resulting relations can be rejoined to obtain the original • But: it can can lose dependencies • After decomp, now legal to add rows whose corresponding rows would be illegal in (rejoined) original M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  23. Recap: goals of normalization • When we decompose a relation R with FDs F into R1..Rn we want: • lossless-join decomposition – no data lost • no/little redundancy: the relations Ri should be in either BCNF or at least 3NF • Dependency preservation: if Fi be the set of dependencies in F+ that include only attributes in Ri: • F is the “sum” of the FDs of the new relations • (F1  F2  F3  …  Fn)+ = F+ • Otherwise checking updates for violation of FDs may require computing joins, which is expensive M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  24. Dependency preservation • Saw that last req. didn’t hold in move-theater example • Did it hold in R(N,O,R,P) example? (on board) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  25. Testing for 3NF • For each dependency X  Y, use attribute closure to check if X is a superkey • If X is not a superkey, verify that each attribute in Y is prime • This test is rather more expensive, since it involves finding candidate keys • Testing for 3NF is NP-complete • Interestingly, decomposition into 3NF can be done in polynomial time •  Testing for 3NF is harder than decomposing into 3NF! • Optimization: need to check only FDs in F, need not check all FDs in F+ (why?) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  26. 3NF Example • R = (J, K, L) • F = (JK  L, L  K) • Two candidate keys: JK and JL • R is in 3NF • JK  L JK is a superkey • L  K K is prime • BCNF decomposition yields • R1 = (L,K), R2 = (L,J) • testing for JK  L requires a join • There is some redundancy in R M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  27. BCNF and 3NF Comparison • Example of problems due to redundancy in 3NF • R = (J, K, L) • F = (JK  L, L  K) • A schema that is in 3NF but not BCNF has the problems of: • redundancy (e.g., the relationship between l1 and k1) • need to use null values (if allowed!), e.g. to represent the relationship between l2 and k2 when there is no corresponding value for attribute J M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  28. Comparison of BCNF and 3NF • It is always possible to decompose a relation into relations in 3NF such that: • the decomposition is lossless • the dependencies are preserved • It is always possible to decompose a relation into relations in BCNF such that: • the decomposition is lossless • but it may not be possible to preserve dependencies • But may eliminate more redundancy M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  29. The Normal Forms (so far) • 1NF: every attribute has an atomic value • 2NF: no longer used • 3NF: for each FD X  Y either • it is trivial, or • X is a superkey, or • Y is a part of some key • BCNF: • 3NF and third 3NF option disallowed M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  30. Distinguishing examples • 1NF but not 2NF: R(Name, SSN ,Mailing- address,Phone) • Key: SSN,Phone • Partial: ssn  name, address • 3NF but not BCNF: R(Title, Theater, N’hood) • Title,N’hood  Theater • Prime-on-right: Theater  N’hood M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  31. Design Goals • Goal for a relational database design is: • No redundancy • Lossless Join • Dependency Preservation • If we cannot achieve this, we accept one of • dependency loss • use of more expensive inter-relational methods to preserve dependencies • data redundancy due to use of 3NF • Interesting: SQL does not provide a direct way of specifying FDs other than superkeys • can specify FDs using assertions, but they are expensive to test M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  32. 3NF • 3NF means we may have anomalies • Example: TEACH(student, teacher, subject) • student, subject  teacher (students not allowed in the same subject with two teachers) • teacher  subject (each teacher teaches one subject) • Subject is prime, so this is 3NF • But we have anomalies: • Insertion: cannot insert a teacher until we have a student taking his subject • If we convert to BCNF, we lost student, subject  teacher M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  33. BCNF and over-normalization • What is the problem? • Schema overload – trying to capture two meanings: • 1) subject X can be taught by teacher Y • 2) student Z takes subject W from teacher V • What to do? • 3NF has anomalies, normalizing to BCNF loses FDs • One soln: keep the 3NF TEACH and another (BCNF) relation SUBJECT-TAUGHT (teacher, subject) • Still (more!) redundancy, but no more insert and delete anomalies M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  34. Normalization Review • Q: What’s required for BCNF? • Q: How do we fix a non-BCNF relation? • Q: What’s the loophole for 3NF? M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  35. Normalization Review • Q: If AsBs violates BCNF, what do we do? • Q: In this case, could the decomposition be lossy? • Q: How do we combine two relations? • Q: Can BCNF decomp. lose FDs? • Q: Can 3NF decomp. lose FDs? M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  36. Name SSN Jobs Streets Citys Michael 123 Mayor 111 East 60th Street New York Michael 123 Mayor 222 Brompton Road London Michael 123 CEO 111 East 60th Street New York Michael 123 CEO 222 Brompton Road London Hilary 456 Senator 333 Some Street Chappaqua Hilary 456 Senator 444 Embassy Row Washington Hilary 456 First Lady 333 Some Street Chappaqua Hilary 456 First Lady 444 Embassy Row Washington Hilary 789 Lawyer 333 Some Street Chappaqua Hilary 789 Lawyer 444 Embassy Row Washington New topic: MVDs • Consider this relation • People ~ their jobs ~ their residences • Person-address/city: many-many • Person-job: many-many • Address/city-job: independent M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  37. Redundancy in BCNF • Lots of redundancy! • Key? All fields • None determined by others! • Non-trivial FDs? None! •  In BCNF? Yes! • Now what? • New concept, leading to another normal form: • Multivalued dependencies M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  38. MVD definition • As  Bs if, when As are held fixed values in Bs are independent of values in rest • More precisely: if t1 and t3 agree on As, we then can find t2 such that t2, t2, t3 agree on As t2, t1 agree of Bs t2, t3 agree on Cs t1 | | | | t2 | | | | t3 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  39. Name Streets Citys Jobs t1 Hilary 333 Some Street Chappaqua Senator t3 Hilary 444 Embassy Row Washington Lawyer t2 Hilary 333 Some Street Chappaqua Lawyer MVD example • Claim: name  streets,cities • If true: can pick arbitrary t1, t3 and find a t2 • We pick: first and last of Hilary’s tuples: • Now: if true, can find another Hilary row with street/address of t1 and job of t3 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  40. MVD example • Now: if true, can find another Hilary row with street/address of t1 and job of t3 • Sure enough: t2 t2 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  41. MVD rules • No splitting rule: • In the example, name  streets,cities • Do we have name  streets? • No: 444 Embassy Row doesn’t go with Chappaqua • NB: City doesn’t determine street – could have >1 house • But city, street aren’t independent t1 t3 M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  42. MVD rules • Trivial dependencies: • As  Bs iff As  BsAi • Transitive rule: • As  Bs, Bs  Cs  As  Cs • Complementation rule: • As  Bs  As  rest • Intuition: if each value in Bs is assoc’ed w/each value in rest, then each value of rest is assoc’ed w/each value in Bs M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  43. MVDs and FDs • MVD is a generalization of FD • Every FD is an MVD • Pf: Suppose As  Bs Pick t1, t3 that agree on As. Must find a t2. Let t2 be t3. Then 1) t2 agrees on As with both 2) t2 agrees on Bs with t1 (why?) 3) t2 agrees on rest with t3 (why?) QED M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  44. Fourth Normal Form • 4NF: like BCNF, but with MVDs not FDs • An MVD As  Bs is nontrivial if • No Bs are As • Some attributes left over (why?) • 4NF: for every nontrivial MVD As  Bs, As is a superkey • In example name  streets,cities, but name isn’t a superkey M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  45. Decomposition to 4NF • Again, analogous to BCNF • If we can find As  Bs for R where As isn’t a superkey, replace R with R1(As,Bs) and R2(As,rest) • Running example: name  streets,cities •  People(name,streets,cities,jobs) becomes Residences(name,street,city) and Employment(name,job) M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  46. 4NF: another construal • In nontrivial As  Bs, As must be superkey • After df of 4NF, text says: “That is, … every nontrivial MVD is really a FD with a superkey on the left” (p123). • We know: FDs are* MVDs but not vice versa • So: Why does this follow? Is it true? • Yes. As is a superkey  As  everything •  As  Bs  the MVD is an FD • Two kinds of MVDs: FDs and “true” MVDs • 4NF eliminates exactly the true ones * The typo swapping these was fixed. M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  47. Summary of normal forms M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  48. Next topic: relational algebra • Set operations: union, intersection, difference • Projection, selection • Cartesian Product • Joins: natural joins, theta joins • Combining operations to form queries • Dependent and independent operations M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  49. What is relational algebra? • An algebra for relations • “High-school” algebra is an algebra for numbers • Formalism for constructing expressions • Operations • Operands: Variables, Constants, expressions • Expressions: • Vars & constants • Operators applied to expressions M.P. Johnson, DBMS, Stern/NYU, Spring 2005

  50. Why do we care about relational algebra? • Why construct expressions on relations? • The exprs are the form questions about the take • The relations these exprs cash out to are the answers to our questions • First proof of RDBMS/RA concept: System R (1979) • Modern implementation of RA: SQL M.P. Johnson, DBMS, Stern/NYU, Spring 2005

More Related