1 / 35

Normalization

Normalization. Normalization. Normalization is the process of efficiently organizing data in a database with two goals in mind First goal: eliminate redundant data for example, storing the same data in more than one table Second Goal: ensure data dependencies make sense

fay
Télécharger la présentation

Normalization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Normalization

  2. Normalization • Normalization is the process of efficiently organizing data in a database with two goals in mind • First goal: eliminate redundant data • for example, storing the same data in more than one table • Second Goal: ensure data dependencies make sense • for example, only storing related data in a table

  3. Benefits of Normalization Bad database designs results in: redundancy: inefficient storage. anomalies: data inconsistency, difficulties in maintenance • Less storage space • Quicker updates • Less data inconsistency • Clearer data relationships • Easier to add data • Flexible Structure

  4. First Normal Form (1NF) • A database schema is in First Normal Form if all tables are flat Student Student Takes Course May needto add keys

  5. Functional Dependencies • A form of constraint • hence, part of the schema • Finding them is part of the database design • Also used in normalizing the relations • Warning: this is the most abstract, and “hardest” part of the database design.

  6. Functional dependency between A and B Functional Dependencies Definition: If two tuples agree on the attributes A1, A2, …, An then they must also agree on the attributes B1, B2, …, Bm Formally: A1, A2, …, An B1, B2, …, Bm

  7. Examples • EmpID  Name, Phone, Position • Position  Phone • but Phone  Position EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer

  8. In General • To check A  B, erase all other columns

  9. Example EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer Position  Phone

  10. Functional Dependencies • Important concept in differentiating good database designs from bad ones • FD is a generalization of the notion of keys • An FD is a set of attributes whose values uniquely determine the values of the remaining attributes. Emp(eno, ename, sal) key FDs:eno => ename Dept(dno, dname, floor) eno => sal Works-in(eno,dno, hours) (eno,dno) => hours dno => dname dno => floor

  11. Functional Dependencies • Trivial dependencies:  =>   =>  if    • Closure • we need to consider all FDs • some are implied by others; e.g., FDs are transitive; if A=>B and B=>C, then A=>C • Given F = set of FDs, we want to find F’ (the closure of all FDs logically implied by F)

  12. Inference Rules for FD’s A1, A2, …, An B1, B2, …, Bm Splitting rule and Combining rule Is equivalent to A1, A2, …, An B1 A1, A2, …, An B2 . . . . . A1, A2, …, An Bm

  13. Inference Rules for FD’s(continued) Trivial Rule A1, A2, …, An Ai where i = 1, 2, ..., n Why ?

  14. Inference Rules for FD’s(continued) Transitive Closure Rule A1, A2, …, An B1, B2, …, Bm If and B1, B2, …, Bm  C1, C2, …, Cp A1, A2, …, An C1, C2, …, Cp then Why ?

  15. Closure of a set of FDs • It is not suff. to consider just the given set of FDs • We need to consider all FDs that hold • Given F, more FDs can be inferred • Such FDs are said to be logically implied by F • F+ is the set of all FDs logically implied by F • We can compute F+using formal defn. of FD • If F were large, this process would be lengthy & cumbersome • Axioms or Rules of Inference provide simpler technique • Armstrong;s Axioms

  16. Inference Rules for FDs Armstrong's inference rules: IR1. (Reflexive) If Y  X, then X Y IR2. (Augmentation) If X Y, then XZ YZ (Notation: XZ stands for X U Z) IR3. (Transitive) If X Y and Y Z, then X Z IR1, IR2, IR3 form a sound & complete set of inference rules Never generates any wrong FD Generate all FDs that hold

  17. Inference Rules for FDs Some additional inference rules that are useful: Decomposition: If XYZ, then XY & XZ Union: If XY & XZ, then XYZ Psuedotransitivity: If XY & WYZ,then WXZ • The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)

  18. Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • some members of F+ • A H • by transitivity from A B and B H • AG I • by augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • By union rule

  19. 2NF – Example - 1 • Inventory (Item, Supplier, Cost, Supplier Address) • We first check if Cost is fully functionally dependent upon the ENTIRE Primary-Key • If I know just Item, can I find out Cost? • No. We can have > 1 supplier for the same product. • If I know just Supplier, and I find out Cost? • No. We need to know what the Item is as well. • So, Cost is fully functionally dependent upon the ENTIRE Primary-Key

  20. 2NF – Example - 2 • Inventory (Item, Supplier, Cost, Supplier Address) • We then check if Supplier Address is fully functionally dependent upon the ENTIRE Primary-Key • If I know just Item, can I find out Supplier Address? • No. We can have > 1 supplier for the same product. • If I know just Supplier, and I find out Supplier Address? • Yes. The supplier’s address does not depend on the Item. • So, Supplier Address is NOT fully functionally dependent upon the ENTIRE Primary-Key  NOT 2NF

  21. So putting things together The above relation is now in 2NF since the relation has no non-key attributes.

  22. Transitive Dependence Give a relation R, Assume the following FD hold: Note : Both Ename and Address attributes are non-key attributes in R, and since Address depends on a non-Prime attribute Name, which depends on the primary key(EmpNo), a transitive dependency exists R2 R1

  23. Database Normalization • Boyce-Codd Normal Form (BCNF) • A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key. (A determinant is any attribute whose value determines other values with a row.) • If a table contains only one candidate key, the 3NF and the BCNF are equivalent. • BCNF is a special case of 3NF.

  24. A Table That Is In 3NF But Not In BCNF

  25. The Decomposition of a Table Structure to Meet BCNF Requirements

  26. Sample Data for a BCNF Conversion

  27. Decomposition into BCNF

  28. BCNF • Based on FDs that take into account all candidate keys of a relation • For a relation with only 1 CK, 3NF & BCNF are equivalent • A relation is said to be in BCNF if every determinant is a CK • Is PLOTS in BCNF? • NO

  29. BCNF vs 3NF • BCNF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R • 3NF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R, or • Y is a subset of K for some key K of R • N.b., no subset of a key is a key

  30. 3NF Schema For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R, or • Y is a subset of K for some key K of R Client, Office -> Client, Office, Account Account -> Office

  31. 3NF Schema For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R, or • Y is a subset of K for some key K of R Client, Office -> Client, Office, Account Account -> Office

  32. Lossless decomposition Account -> Office No non-trivial FDs BCNF vs 3NF 3NF has some redundancy BCNF does not Unfortunately, BCNF is not dependency preserving, but 3NF is For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R • Y is a subset of K for some key K of R Client, Office -> Client, Office, Account Account -> Office

  33. Closure of a set of FDs • Given a set of FDs F on a relation R, it may be possible that several other FDs must also hold for R • For Example, R=(A,B,C) & FDs, AB & BC hold in R, then FD AC also holds on R • For a given value of A, there can be only one corresponding value of B, & for that value of B, there can be only one corresponding value for C • The closure of F is the set of all FDs that can be inferred from F, & is denoted by F+

  34. Problem 1 • Consider the relation R(A,B,C) with functional dependencies ABC and CB. • Is R in 2NF? • Is R in 3NF? • Is R in BCNF?

More Related