290 likes | 439 Vues
CS 405G: Introduction to Database Systems. 17. Normal Forms. Today’s Topic. Functional Dependency. Normalization Decomposition BCNF. Could be anything, e.g. d. Must be “b”. Functional dependencies.
E N D
CS 405G: Introduction to Database Systems 17. Normal Forms
Today’s Topic • Functional Dependency. • Normalization • Decomposition • BCNF Chen Qian @ University of Kentucky
Could be anything, e.g. d Must be “b” Functional dependencies • A functional dependency (FD) has the form X->Y, where X and Y are sets of attributes in a relation R • X->Y means that whenever two tuples in R agree on all the attributes in X, they must also agree on all attributes in Y • t1[X] = t2[X] t1[Y] = t2[Y] Chen Qian @ University of Kentucky
FD examples Address (street_address, city, state, zip) • street_address, city, state->zip • zip->city, state • zip, state->zip? • This is a trivial FD • Trivial FD: LHS RHS • zip->state, zip? • This is non-trivial, but not completely non-trivial • Completely non-trivial FD: LHS ∩ RHS = ? Chen Qian @ University of Kentucky
Attribute closure • Given R, a set of FDs F that hold in R, and a set of attributes Z in R:The closure of Z (denoted Z+) with respect to F is the set of all attributes {A1, A2, …} functionally determined by Z (that is, Z ->A1A2 …) • Algorithm for computing the closure • Start with closure = Z • If X->Y is in F and X is already in the closure, then also add Y to the closure • Repeat until no more attributes can be added Chen Qian @ University of Kentucky
A more complex example WorkOn(EID, Ename, email, PID, Pname, Hours) • EID->Ename, email • email->EID • PID -> Pname • EID, PID->Hours (Not a good design, and we will see why later) Chen Qian @ University of Kentucky
Example of computing closure • F includes: • EID->Ename, email • email->EID • PID -> Pname • EID, PID->Hours • { PID, email }+ = ? • closure = { PID, email } • email->EID • Add EID; closure is now { PID, email, EID } • EID->Ename, email • Add Ename, email; closure is now { PID, email, EID, Ename } • PID -> Pname • Add Pname; close is now{ PID, Pname, email, EID, Ename } • EID, PID->hours • Add hours; closure is now all the attributes in WorksOn Chen Qian @ University of Kentucky
Using attribute closure Given a relation R and set of FD’s F • Does another FD X->Y follow from F? • Compute X+ with respect to F • If YX+, then X->Y follow from F • Is K a super key of R? • Compute K+ with respect to F • If K+ contains all the attributes of R, K is a super key • Is a super key K a key of R? • Test where K’ = K – { a | a K} is a superkey of R for all possible a Chen Qian @ University of Kentucky
Rules of FD’s • Armstrong’s axioms • Reflexivity: If YX, then X->Y • Augmentation: If X->Y, then (X,Z)-> (Y,Z) for any Z • Transitivity: If X->Y and Y->Z, then X->Z • Rules derived from axioms • Splitting: If X->YZ, then X->Y and X->Z • Combining: If X->Y and X->Z, then X->YZ Chen Qian @ University of Kentucky
Using rules of FD’s Given a relation R and set of FD’s F • Does another FD X->Y follow from F? • Use the rules to come up with a proof • Example: • F includes:EID->Ename, email; email->EID; EID, PID->Hours, Pid -> Pname • PID, email->hours? email->EID (given in F) PID, email -> PID, EID (augmentation) PID, EID->hours (given in F) PID, email->hours (transitivity) Chen Qian @ University of Kentucky
Example of redundancy • WorkOn (EID, Ename, email, PID, hour) • We say X ->Y is a partial dependency if there exist a X’ X such that X’->Y • e.g. EID, PID->Ename • Otherwise, X ->Y is a full dependency • e.g. EID, PID ->hours Chen Qian @ University of Kentucky
Database Normalization • Database normalization relates to the level of redundancy in a relational database’s structure. • The key idea is to reduce the chance of having multiple different version of the same data. • Well-normalized databases have a schema that reflects the true dependencies between tracked quantities. • Any increase in normalization generally involves splitting existing tables into multiple ones, which must be re-joined each time a query is issued. Chen Qian @ University of Kentucky
Normalization • A normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. • A normal form is acertification that tells whether a relation schema is in a particular state Chen Qian @ University of Kentucky
Normal Forms • Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF. • 3NF is widely considered to be sufficient. • Normalizing beyond 3NF can be tricky with current SQL technology as of 2005 • Full normalization is considered a good exercise to help discover all potential internal database consistency problems. Chen Qian @ University of Kentucky
First Normal Form ( 1NF ) • NF is to characterize a relation (not an attribute, a key, etc…) • We can only say “this relation or table is in 1NF” • A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain. Chen Qian @ University of Kentucky
First Normal Form ( 1NF ) http://en.wikipedia.org/wiki/First_normal_form Chen Qian @ University of Kentucky
More examples. • http://en.wikipedia.org/wiki/First_normal_form Chen Qian @ University of Kentucky
2nd Normal Form • An attribute A of a relation R is a nonprimary attributeif it is not part of any key in R, otherwise, A is a primary attribute. • R is in (general) 2nd normal form if every nonprimary attribute A in R is not partially functionally dependent on any key of R X , Y ->Z, W Y -> Z (X , Y,W) (Y,Z) Chen Qian @ University of Kentucky
2nd Normal Form • Note about 2nd Normal Form • by definition,every nonprimary attribute is functionally dependent on every key of R • In other words, R is in its 2nd normal form if we could not find a partial dependency of a nonprimary key to a key in R. Chen Qian @ University of Kentucky
Redundancy Example • If a key will result a partial dependency of a nonprimary attribute. • e.g. EID, PID->Ename • In this case, the attribute (Ename) should be separated with its full dependency key (EID) to be a new table. • So, to check whether a table includes redundancy. Try every nonprimary attribute and check whether it fully depends on any key. Chen Qian @ University of Kentucky
Second normal Form ( 2NF ) • 2NF prescribes full functional dependency on the primary key. • It most commonly applies to tables that have composite primary keys, where two or more attributes comprise the primary key. • It requires that there are no non-trivial functional dependencies of a non-key attribute on a part (subset) of a candidate key. • A table is said to be in the 2NF if and only if it is in the 1NF and every non-key attribute is irreducibly dependent on the primary key Chen Qian @ University of Kentucky
2NF Example • The PART_NUMBER and SUPPLIER_NAME form the composite primary key. • SUPPLIER_ADDRESS is only dependent on the SUPPLIER_NAME, and therefore this table breaks 2NF. Chen Qian @ University of Kentucky
Decomposition • Decomposition eliminates redundancy • To get back to the original relation, use natural join. Decomposition Foreign key Chen Qian @ University of Kentucky
Decomposition • Decomposition may be applied recursively Chen Qian @ University of Kentucky
Unnecessary decomposition • Fine: join returns the original relation • Unnecessary: no redundancy is removed, and now EID is stored twice-> Chen Qian @ University of Kentucky
Bad decomposition • Association between PID and hours is lost • Join returns more rows than the original relation Chen Qian @ University of Kentucky
Lossless join decomposition • Decompose relation R into relations S and T • attrs(R) = attrs(S) attrs(T) • S = πattrs(S) ( R ) • T = πattrs(T) ( R ) • The decomposition is a lossless join decomposition if, given known constraints such as FD’s, we can guarantee that R = ST • Any decomposition gives RST (why?) • A lossydecomposition is one with RST Chen Qian @ University of Kentucky
Loss? But I got more rows-> • “Loss” refers not to the loss of tuples, but to the loss of information • Or, the ability to distinguish different original tuples Chen Qian @ University of Kentucky
Questions about decomposition • When to decompose • How to come up with a correct decomposition (i.e., lossless join decomposition) Chen Qian @ University of Kentucky