1 / 29

CS 405G: Introduction to Database Systems

CS 405G: Introduction to Database Systems. 17. Normal Forms. Today’s Topic. Functional Dependency. Normalization Decomposition BCNF. Could be anything, e.g. d. Must be “b”. Functional dependencies.

keiji
Télécharger la présentation

CS 405G: Introduction to Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 405G: Introduction to Database Systems 17. Normal Forms

  2. Today’s Topic • Functional Dependency. • Normalization • Decomposition • BCNF Chen Qian @ University of Kentucky

  3. Could be anything, e.g. d Must be “b” Functional dependencies • A functional dependency (FD) has the form X->Y, where X and Y are sets of attributes in a relation R • X->Y means that whenever two tuples in R agree on all the attributes in X, they must also agree on all attributes in Y • t1[X] = t2[X]  t1[Y] = t2[Y] Chen Qian @ University of Kentucky

  4. FD examples Address (street_address, city, state, zip) • street_address, city, state->zip • zip->city, state • zip, state->zip? • This is a trivial FD • Trivial FD: LHS RHS • zip->state, zip? • This is non-trivial, but not completely non-trivial • Completely non-trivial FD: LHS ∩ RHS = ? Chen Qian @ University of Kentucky

  5. Attribute closure • Given R, a set of FDs F that hold in R, and a set of attributes Z in R:The closure of Z (denoted Z+) with respect to F is the set of all attributes {A1, A2, …} functionally determined by Z (that is, Z ->A1A2 …) • Algorithm for computing the closure • Start with closure = Z • If X->Y is in F and X is already in the closure, then also add Y to the closure • Repeat until no more attributes can be added Chen Qian @ University of Kentucky

  6. A more complex example WorkOn(EID, Ename, email, PID, Pname, Hours) • EID->Ename, email • email->EID • PID -> Pname • EID, PID->Hours (Not a good design, and we will see why later) Chen Qian @ University of Kentucky

  7. Example of computing closure • F includes: • EID->Ename, email • email->EID • PID -> Pname • EID, PID->Hours • { PID, email }+ = ? • closure = { PID, email } • email->EID • Add EID; closure is now { PID, email, EID } • EID->Ename, email • Add Ename, email; closure is now { PID, email, EID, Ename } • PID -> Pname • Add Pname; close is now{ PID, Pname, email, EID, Ename } • EID, PID->hours • Add hours; closure is now all the attributes in WorksOn Chen Qian @ University of Kentucky

  8. Using attribute closure Given a relation R and set of FD’s F • Does another FD X->Y follow from F? • Compute X+ with respect to F • If YX+, then X->Y follow from F • Is K a super key of R? • Compute K+ with respect to F • If K+ contains all the attributes of R, K is a super key • Is a super key K a key of R? • Test where K’ = K – { a | a K} is a superkey of R for all possible a Chen Qian @ University of Kentucky

  9. Rules of FD’s • Armstrong’s axioms • Reflexivity: If YX, then X->Y • Augmentation: If X->Y, then (X,Z)-> (Y,Z) for any Z • Transitivity: If X->Y and Y->Z, then X->Z • Rules derived from axioms • Splitting: If X->YZ, then X->Y and X->Z • Combining: If X->Y and X->Z, then X->YZ Chen Qian @ University of Kentucky

  10. Using rules of FD’s Given a relation R and set of FD’s F • Does another FD X->Y follow from F? • Use the rules to come up with a proof • Example: • F includes:EID->Ename, email; email->EID; EID, PID->Hours, Pid -> Pname • PID, email->hours? email->EID (given in F) PID, email -> PID, EID (augmentation) PID, EID->hours (given in F) PID, email->hours (transitivity) Chen Qian @ University of Kentucky

  11. Example of redundancy • WorkOn (EID, Ename, email, PID, hour) • We say X ->Y is a partial dependency if there exist a X’ X such that X’->Y • e.g. EID, PID->Ename • Otherwise, X ->Y is a full dependency • e.g. EID, PID ->hours Chen Qian @ University of Kentucky

  12. Database Normalization • Database normalization relates to the level of redundancy in a relational database’s structure. • The key idea is to reduce the chance of having multiple different version of the same data. • Well-normalized databases have a schema that reflects the true dependencies between tracked quantities. • Any increase in normalization generally involves splitting existing tables into multiple ones, which must be re-joined each time a query is issued. Chen Qian @ University of Kentucky

  13. Normalization • A normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. • A normal form is acertification that tells whether a relation schema is in a particular state Chen Qian @ University of Kentucky

  14. Normal Forms • Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF. • 3NF is widely considered to be sufficient. • Normalizing beyond 3NF can be tricky with current SQL technology as of 2005 • Full normalization is considered a good exercise to help discover all potential internal database consistency problems. Chen Qian @ University of Kentucky

  15. First Normal Form ( 1NF ) • NF is to characterize a relation (not an attribute, a key, etc…) • We can only say “this relation or table is in 1NF” • A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain. Chen Qian @ University of Kentucky

  16. First Normal Form ( 1NF ) http://en.wikipedia.org/wiki/First_normal_form Chen Qian @ University of Kentucky

  17. More examples. • http://en.wikipedia.org/wiki/First_normal_form Chen Qian @ University of Kentucky

  18. 2nd Normal Form • An attribute A of a relation R is a nonprimary attributeif it is not part of any key in R, otherwise, A is a primary attribute. • R is in (general) 2nd normal form if every nonprimary attribute A in R is not partially functionally dependent on any key of R X , Y ->Z, W Y -> Z (X , Y,W) (Y,Z)  Chen Qian @ University of Kentucky

  19. 2nd Normal Form • Note about 2nd Normal Form • by definition,every nonprimary attribute is functionally dependent on every key of R • In other words, R is in its 2nd normal form if we could not find a partial dependency of a nonprimary key to a key in R. Chen Qian @ University of Kentucky

  20. Redundancy Example • If a key will result a partial dependency of a nonprimary attribute. • e.g. EID, PID->Ename • In this case, the attribute (Ename) should be separated with its full dependency key (EID) to be a new table. • So, to check whether a table includes redundancy. Try every nonprimary attribute and check whether it fully depends on any key. Chen Qian @ University of Kentucky

  21. Second normal Form ( 2NF ) • 2NF prescribes full functional dependency on the primary key. • It most commonly applies to tables that have composite primary keys, where two or more attributes comprise the primary key. • It requires that there are no non-trivial functional dependencies of a non-key attribute on a part (subset) of a candidate key. • A table is said to be in the 2NF if and only if it is in the 1NF and every non-key attribute is irreducibly dependent on the primary key Chen Qian @ University of Kentucky

  22. 2NF Example • The PART_NUMBER and SUPPLIER_NAME form the composite primary key. • SUPPLIER_ADDRESS is only dependent on the SUPPLIER_NAME, and therefore this table breaks 2NF. Chen Qian @ University of Kentucky

  23. Decomposition • Decomposition eliminates redundancy • To get back to the original relation, use natural join. Decomposition Foreign key Chen Qian @ University of Kentucky

  24. Decomposition • Decomposition may be applied recursively Chen Qian @ University of Kentucky

  25. Unnecessary decomposition • Fine: join returns the original relation • Unnecessary: no redundancy is removed, and now EID is stored twice-> Chen Qian @ University of Kentucky

  26. Bad decomposition • Association between PID and hours is lost • Join returns more rows than the original relation Chen Qian @ University of Kentucky

  27. Lossless join decomposition • Decompose relation R into relations S and T • attrs(R) = attrs(S) attrs(T) • S = πattrs(S) ( R ) • T = πattrs(T) ( R ) • The decomposition is a lossless join decomposition if, given known constraints such as FD’s, we can guarantee that R = ST • Any decomposition gives RST (why?) • A lossydecomposition is one with RST Chen Qian @ University of Kentucky

  28. Loss? But I got more rows-> • “Loss” refers not to the loss of tuples, but to the loss of information • Or, the ability to distinguish different original tuples Chen Qian @ University of Kentucky

  29. Questions about decomposition • When to decompose • How to come up with a correct decomposition (i.e., lossless join decomposition) Chen Qian @ University of Kentucky

More Related