Chapter 7: Relational Database Design

1. Chapter 7: Relational Database Design

5. Closure of a set of FDs The set of all FDs implied by a given set F of FDs is called the closure of F, denoted as F + . Armstrong�s Axioms, can be applied repeatedly to infer all FDs implied by a set of FDs.

7. Armstrong�s Axioms is sound and complete. Sound: they generate only FDs in F+. Complete: repeated application of these rules will generate all FDs in F+. The proof of soundness is straight forward, but completeness is harder to prove.

8. Proof of Armstrong�s Axioms (soundness) Notation: We use t[X] for ?X [ t ] for any tuple t. Reflexivity: If Y ? X, then X ? Y Assume ? t1, t2 such that t1[X] = t2[X] then t1[ Y ] = t2[ Y ] since Y ? X Hence X ? Y

9. Augmentation: if X ? Y, then XZ ? YZ Assume ? t1, t2 such that t1 [ XZ ] = t2 [ XZ] t1 [Z] = t2 [Z], since Z ? XZ ------ (1) t1 [X] = t2 [X], since X ? XZ t1 [Y] = t2 [Y], definition of X ? Y ------ (2) t1 [YZ] = t2 [ YZ ] from (1) and (2) Hence, XZ ? YZ

10. Transitivity: If X ?Y and Y ? Z, then X ?Z. Assume ? t1, t2 such that t1 [X] = t2 [X] Then t1 [Y] = t2 [Y], definition of X ? Y Hence, t1 [Z] = t2 [Z], definition of Y ?Z Therefore, X ?Z

11. Additional rules Sometimes, it is convenient to use some additional rules while reasoning about F+. These additional rules are not essential in the sense that their soundness can be proved using Armstrong�s Axioms.

12. To show correctness of the union rule: X ? Y and X ? Z , then X ? YZ ( union ) Proof: X ? Y � (1) ( given ) X ? Z � (2) ( given ) XX ? XY � (3) ( augmentation on (1) ) X ? XY � (4) ( simplify (3) ) XY ? ZY � (5) ( augmentation on (2) ) X ? ZY � (6) ( transitivity on (4) and (5) )

13. To show correctness of the decomposition rule: if X ? YZ , then X ? Y and X ? Z (decomposition) Proof: X ? YZ � (1) ( given ) YZ ? Y � (2) ( reflexivity ) X ? Y � (3) ( transitivity on (1), (2) ) YZ ? Z � (4) ( reflexivity ) X ? Z � (5) ( transitivity on (1), (4) )

14. R = ( A, B, C ) F = { A ? B, B ? C } F+ = { A ? A, B ? B, C ? C, AB ? AB, BC ? BC, AC ? AC, ABC ? ABC, AB ? A, AB ? B, BC ? B, BC ? C, AC ? A, AC ? C, ABC ? AB, ABC ? BC, ABC ? AC, ABC ? A, ABC ? B, ABC ? C, A ? B, � (1) ( given ) B ? C, � (2) ( given ) A ? C, � (3) ( transitivity on (1) and (2) ) AC ? BC, � (4) ( augmentation on (1) ) AC ? B, � (5) ( decomposition on (4) ) A ? AB, � (6) ( augmentation on (1) ) AB ? AC, AB ? C, B ? BC, A ? AC, AB ? BC, AB ? ABC, AC ? ABC, A ? BC, A ? ABC }

15. Attribute Closure Computing the closure of a set of FDs can be expensive In many cases, we just want to check if a given FD X ? Y is in F+. X - a set of attributes F - a set of functional dependencies

19. Relational Database Design Given a relation schema, we need to decide whether it is a good design or we need to decompose it into smaller relations. Such a decision must be guided by an understanding of what problems arise from the current schema. To provide such guidance, several normal forms have been proposed. If a relation schema is in one of these normal forms, we know that certain kinds of problems cannot arise.

21. First Normal Form Every field contains only atomic values No lists or sets. Implicit in our definition of the relational model. Second Normal Form every non-key attribute is fully functionally dependent on the ENTIRE primary key. Mainly of historical interest.

22. Boyce-Codd Normal Form (BCNF)

23. Intuitively, in a BCNF relation, the only nontrivial dependencies are those in which a key determines some attributes. Each tuple can be thought of as an entity or relationship, identified by a key and described by the remaining attributes

25. In general, suppose X ? A violates BCNF, then one of the following holds X is a subset of some key K: we store ( X, A ) pairs redundantly. X is not a subset of any key: there is a chain K ? X ? A ( transitive dependency )

26. Third Normal Form The definition of 3NF is similar to that of BCNF, with the only difference being the third condition. Recall that a key for a relation is a minimal set of attributes that uniquely determines all other attributes. A must be part of a key (any key, if there are several). It is not enough for A to be part of a superkey, because this condition is satisfied by every attribute.

27. Suppose that a dependency X ? A causes a violation of 3NF. There are two cases: X is a proper subset of some key K. Such a dependency is sometimes called a partial dependency. In this case, we store (X,A) pairs redundantly. X is not a proper subset of any key. Such a dependency is sometimes called a transitive dependency, because it means we have a chain of dependencies K ? X?A.

29. Motivation of 3NF By making an exception for certain dependencies involving key attributes, we can ensure that every relation schema can be decomposed into a collection of 3NF relations using only decompositions. Such a guarantee does not exist for BCNF relations. It weaken the BCNF requirements just enough to make this guarantee possible. Unlike BCNF, some redundancy is possible with 3NF. The problems associate with partial and transitive dependencies persist if there is a nontrivial dependency X?A and X is not a superkey, even if the relation is in 3NF because A is part of a key.

32. Decomposition Decomposition is a tool that allows us to eliminate redundancy. It is important to check that a decomposition does not introduce new problems. A decomposition allows us to recover the original relation? Can we check integrity constraints efficiently?

36. Lossless Join Decomposition The relation schemas { R1, R2, �, Rn } is a lossless-join decomposition of R if: for all possible relations r on schema R, r = ?R1( r ) ? R2( r ) � ? Rn( r )

41. Example R = ( A, B, C ) F = { A ? B } R = { A, B } + { A, C } is a lossless join decomposition R = { A, B } + { B, C } is not a lossless join decomposition Also, consider the previous relation �Student� Please also read the example in P.620 of your textbook.

43. Dependency Preservation R - a relation schema F - set of functional dependencies on R { R1, R2 } � a decomposition of R. Fi - the set of dependencies in F+ involves only attributes in Ri. Fi is called the projection of F on the set of attributes of Ri. dependency is preserved if Intuitively, a dependency-preserving decomposition allows us to enforce all FDs by examining a single relation instance on each insertion or modification of a tuple.

50. Normalization Consider algorithms for converting relations to BCNF or 3NF. If a relation schema is not in BCNF it is possible to obtain a lossless-join decomposition into a collection of BCNF relation schemas. Dependency-preserving is not guaranteed. 3NF There is always a dependency-preserving, lossless-join decomposition into a collection of 3NF relation schemas.

51. BCNF Decomposition It is a lossless join decomposition. But not necessary dependency preserving

56. Canonical Cover A minimal and equivalent set of functional dependency

62. A canonical cover Fc of a set of functional dependency F must have the following properties. Every functional dependency in Fc contains no extraneous attributes in (ones that can be removed from without changing Fc+). So A is extraneous in if and logically implies Fc.

63. Every functional dependency in Fc contains no extraneous attributes in (ones that can be removed from without changing Fc+). So A is extraneous in if and logically implies Fc. Each left side of a functional dependency in Fc is unique. That is there are no two dependencies and in Fc such that .

67. 3NF Synthesis Algorithm Note: result is lossless-join and dependency preserving

70. BCNF VS 3NF always possible to decompose a relation into relations in 3NF and the decomposition is lossless dependencies are preserved always possible to decompose a relation into relations in BCNF and the decomposition is lossless may not be possible to preserve dependencies

74. Design Goals Goal for a relational database design is: BCNF lossless join Dependency preservation If we cannot achieve this, we accept: 3NF lossless join Dependency preservation

75. Multivalued Dependencies There are database schemas in BCNF that do not seem to be sufficiently normalized Consider a database classes(course, teacher, book)such that (c,t,b) ? classes means that t is qualified to teach c, and b is a required textbook for c The database is supposed to list for each course the set of teachers any one of which can be the course�s instructor, and the set of books, all of which are required for the course (no matter who teaches it).

76. There are no non-trivial functional dependencies and therefore the relation is in BCNF Insertion anomalies � i.e., if Sara is a new teacher that can teach database, two tuples need to be inserted (database, Sara, DB Concepts) (database, Sara, Ullman) Multivalued Dependencies (Cont.)

77. Therefore, it is better to decompose classes into: Multivalued Dependencies (Cont.)

78. Multivalued Dependencies (MVDs) Let R be a relation schema and let ? ? R and ? ? R. The multivalued dependency ? ?? ? holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[?] = t2 [?], there exist tuples t3 and t4 in r such that: t1[?] = t2 [?] = t3 [?] = t4 [?] t3[?] = t1 [?] t3[R � ?] = t2[R � ?] t4 [?] = t2[?] t4[R � ?] = t1[R � ?]

79. MVD (Cont.) Tabular representation of ? ?? ?

80. 4th Normal Form No multi-valued dependencies

81. 4th Normal Form Note: 4th Normal Form violations occur when a triple (or higher) concatenated key represents a pair of double keys

82. 4th Normal Form

83. 4th Normal Form

84. 4th Normal Form INSTR-BOOK-COURSE(InstrID, Book, CourseID) COURSE-BOOK(CourseID, Book) COURSE-INSTR(CourseID, InstrID)

85. 4NF(No multivalued dependencies)

86. Example Let R be a relation schema with a set of attributes that are partitioned into 3 nonempty subsets. Y, Z, W We say that Y ?? Z (Y multidetermines Z)if and only if for all possible relations r(R) < y1, z1, w1 > ? r and < y2, z2, w2 > ? r then < y1, z1, w2 > ? r and < y2, z2, w1 > ? r Note that since the behavior of Z and W are identical it follows that Y ?? Z if Y ?? W

87. Example (Cont.) In our example: course ?? teacher course ?? book The above formal definition is supposed to formalize the notion that given a particular value of Y (course) it has associated with it a set of values of Z (teacher) and a set of values of W (book), and these two sets are in some sense independent of each other. Note: If Y ? Z then Y ?? Z Indeed we have (in above notation) Z1 = Z2The claim follows.

88. Use of Multivalued Dependencies We use multivalued dependencies in two ways: 1. To test relations to determine whether they are legal under a given set of functional and multivalued dependencies 2. To specify constraints on the set of legal relations. We shall thus concern ourselves only with relations that satisfy a given set of functional and multivalued dependencies. If a relation r fails to satisfy a given multivalued dependency, we can construct a relations r? that does satisfy the multivalued dependency by adding tuples to r.

89. Theory of MVDs From the definition of multivalued dependency, we can derive the following rule: If ? ? ?, then ? ?? ? That is, every functional dependency is also a multivalued dependency The closure D+ of D is the set of all functional and multivalued dependencies logically implied by D. We can compute D+ from D, using the formal definitions of functional dependencies and multivalued dependencies. We can manage with such reasoning for very simple multivalued dependencies, which seem to be most common in practice For complex dependencies, it is better to reason about sets of dependencies using a system of inference rules (see Appendix C).

90. Fourth Normal Form A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form ? ?? ?, where ? ? R and ? ? R, at least one of the following hold: ? ?? ? is trivial (i.e., ? ? ? or ? ? ? = R) ? is a superkey for schema R If a relation is in 4NF it is in BCNF

91. Restriction of Multivalued Dependencies The restriction of D to Ri is the set Di consisting of All functional dependencies in D+ that include only attributes of Ri All multivalued dependencies of the form ? ?? (? ? Ri) where ? ? Ri and ? ?? ? is in D+

92. 4NF Decomposition Algorithm result: = {R};done := false;compute D+;Let Di denote the restriction of D+ to Ri while (not done) if (there is a schema Ri in result that is not in 4NF) then begin let ? ?? ? be a nontrivial multivalued dependency that holds on Ri such that ? ? Ri is not in Di, and ?????; result := (result - Ri) ? (Ri - ?) ? (?, ?); end else done:= true; Note: each Ri is in 4NF, and decomposition is lossless-join

93. Example R =(A, B, C, G, H, I) F ={ A ?? B B ?? HI CG ?? H } R is not in 4NF since A ?? B and A is not a superkey for R Decomposition a) R1 = (A, B) (R1 is in 4NF) b) R2 = (A, C, G, H, I) (R2 is not in 4NF) c) R3 = (C, G, H) (R3 is in 4NF) d) R4 = (A, C, G, I) (R4 is not in 4NF) Since A ?? B and B ?? HI, A ?? HI, A ?? I e) R5 = (A, I) (R5 is in 4NF) f)R6 = (A, C, G) (R6 is in 4NF)

94. Further Normal Forms Join dependencies generalize multivalued dependencies lead to project-join normal form (PJNF) (also called fifth normal form) A class of even more general constraints, leads to a normal form called domain-key normal form. Problem with these generalized constraints: are hard to reason with, and no set of sound and complete set of inference rules exists. Hence rarely used

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design

Presentation Transcript

Chapter 7 Relational Database Design by ER- and EER-to-Relational Mapping

Chapter 3

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design

Chapter 5: Other Relational Languages

Chapter 15 Database Design Theory: Normalization Algorithms

Chapter 6 Relational Database Design

Chapter 5:

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design

Chapter 7 Relational Database Design by ER- and EER-to-Relational Mapping

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design

Chapter 3 Section 3.4 Relational Database Operators (Relational Algebra)

Database Design Techniques for Clinical Research

Chapter 13: Query Processing

Chapter 9 Relational Database Design by ER- and EER-to-Relational Mapping

Chapter 3 The Relational Database Model

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design

Relational Database Design

Chapter 8: Application Design and Development