L02: Logical Database Design

L02: Logical Database Design Introduction Functional Dependencies Normal Forms Normalization by decomposition

Life Cycle of Database Applications Analyze the company situation Define problems & constraints Define objectives; scope and boundaries Database initial study Database design Conceptual design; DBMS selection Logical & physical design Implement. & loading Install the DBMS; create the DB; load & convert the data Testing & evaluation Test, fine-tune, evaluate the DB and its application programs Operation Produce the required information flow Maint. & evolution Introducing changes; make enhancements

Database Design Procedures Determine end user views, outputs, and transaction-processing requirements Data & requirement analysis Make the semantics clear Conceptual modeling Identifying main processes, insert, update, and delete rules, validate reports, queries, views, integrity, sharing, and security Data model verification DBMS software selection Translate the conceptual schema into definitions of tables, views, and so on Logical design Physical design Define storage structures and access paths for optimum performance

Overview of Relational Database Design • Requirement Analysis • Conceptual design: make the semantics clear • What are the entities and relationships in the enterprise? • What information about these entities and relationships should we store in the database? • What are the integrity constraints or business rules that hold? • A database `schema’ in the ER Model can be represented pictorially (ER diagrams). • Can map an ER diagram into a relational schema. • Logical design: (Normalization) Check relational schema for redundancies and related anomalies. • Physical Database Design and Tuning: Consider typical workloads and further refine the database design.

Logical Design of Relational Databases • Logical design of relational database requires that we find a “good” collection of relation schemas. A bad design may lead to • Repetition of information. • Inability to represent certain information. • Design goals: • Avoid redundant data • Ensure that relationships among attributes are represented • Facilitate the checking of updates for violation of database integrity constraints

The Evils of Redundancy • Redundancyis at the root of several problems associated with relational schemas: • redundant storage, insert/delete/update anomalies • Input of our design phase: An initial schema. • Consider relation obtained from Hourly_Emps: • Hourly_Emps (eid, name, office, rating, hrly_wages, hrs_worked) • Notation: We will denote this relation schema by listing the attributes: ENORWH • This is really the setof attributes {E,N,O,R,W,H}. • Sometimes, we will refer to all attributes of a relation by using the relation name. (e.g., Hourly_Emps for ENORWH)

Anomalies • IC: the hourly_wage is determined by rating. • Update anomaly: Can we change W in just the 1st tuple of ENORWH? • Insertion anomaly:What if we want to insert an employee and don’t know the hourly wage for his rating? • Deletion anomaly:If we delete all employees with rating 5, we lose the information about the wage for rating 5!

Decomposition – The Basic Approach Hourly_Emps2 • Main refinement technique: decomposition(replacing ENORWH with ENORH and RW) • Decomposition should be used judiciously: • Is there reason to decompose a relation? • What problems (if any) does the decomposition cause? • Integrity constraints, in particularfunctional dependencies, can be used to identify schemas with such problems and to suggest refinements. Wages

Database Design Theory • Decide whether a particular relation R is in “good” form. • In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that • each relation is in good form • the decomposition is a lossless-join decomposition • Our theory is based on: • functional dependencies • multivalued dependencies (not going to cover in this course)

Functional Dependencies • A functional dependencyX  Yholds over relation R if, for every allowable instance rof R: • t1 r, t2 r, X(t1) = X (t2) implies  Y(t1) = Y (t2) • i.e., given two tuples in r, if the X values agree, then the Y values must also agree. (X and Y are setsof attributes.) • A functional dependency is a generalization of the notion of a key. • Constraints on the set of legal relations. • Require that the value for a certain set of attributes determines uniquely the value for another set of attributes

Functional Dependencies (Cont.) • 4 1 5 3 7 • Example: Consider r(A,B) with the instance of r. On this instance, A B does NOT hold, but B  A does hold. • An FD is a statement about allallowable relations. • Must be identified based on semantics of application. • Given some allowable instance r1of R, we can check if it violates some FD f, but we cannot tell if fholds over R! • A functional dependency is trivial if it is satisfied by all instances of a relation • E.g. • Sid, Sname Sname • SnameSname • In general,   is trivial if   

Use of Functional Dependencies • We use functional dependencies to: • test relations to see if they are legal under a given set of functional dependencies. • If a relation r is legal under a set F of functional dependencies, we say that r satisfies F. • specify constraints on the set of legal relations • We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F. • Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. For example, a specific instance of Sailors may, by chance, satisfy SnameSid.

Armstrong’s Axioms • Given some FDs, we can usually infer additional FDs: • ssn  did, did  lot implies ssn  lot • An FD fis implied bya set of FDs Fif fholds whenever all FDs in Fhold. • F+ = closure of Fis the set of all FDs that are implied by F. • Armstrong’s Axioms (X, Y, Z are sets of attributes): • Reflexivity:If X  Y, then X  Y • Augmentation: If X  Y, then XZ  YZ for any Z • Transitivity:If X  Y and Y  Z, then X  Z • These aresoundand completeinference rules for FDs!

Reasoning About FDs • Couple of additional rules (that follow from AA): • Union:If X  Y and X  Z, then X  YZ • Decomposition:If X  YZ, then X  Y and X  Z • Pseudo-transitivity:If X Y and ZY T, then X ZT • Example: Contracts(cid, sid, jid, did, pid, qty, value),and: • C is the key: C  CSJDPQV • Project purchases each part using single contract:JP  C • Dept purchases at most one part from a supplier:SD  P JP  C, C  CSJDPQV imply JP  CSJDPQV SD  P implies SDJ  JP SDJ  JP, JP  CSJDPQV imply SDJ  CSJDPQV

Closure of Attribute Sets • Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # attrs!) • Typically, we just want to check if a given FD X  Y is in the closure of a set of FDs F. An efficient check: • Compute attribute closureof X (denoted X+ ) wrt F: • Set of all attributes A such that X  A is in F+ • There is a linear time algorithm to compute this. • Check if Y is in X+ • Does F = {A  B, B  C, C D  E } imply A  E? • i.e, is A  E in the closureF+ ? Equivalently, is E inA+ ?

Algorithm to Compute Closure • Algorithm to compute +, the closure of  under F result:= ; while (changes to result) do for each    in F do begin if   result then result :=result  ; end R = (A, B, C, G, H, I)F = ( A  B A  C CG  H CG  I B  H } (AG+)1. Result= AG2. Result= ABCG (A  C; A  B and A  AGB)3. Result= ABCGH (CG  H and CG  AGBC)4. Result=ABCGHI (CG  I and CG  AGBCH) Is AG a candidate key?

Uses of Attribute Closure • There are several uses of the attribute closure algorithm: • Testing for superkey: • To test if  is a superkey, we compute +, and check if + contains all attributes of R. • Testing functional dependencies • To check if a functional dependency    holds (or, in other words, is in F+), just check if   +. • That is, we compute + by using attribute closure, and then check if it contains . • Is a simple and cheap test, and very useful • Computing closure of F • For each   R, we find the closure +, and for each S  +, we output a functional dependency   S.

Closure of a Set of Functional Dependencies • Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. • E.g. If A  B and B  C, then we can infer that A  C • The set of all functional dependencies logically implied by F is the closure of F. • We denote the closure of F by F+.

Procedure for Computing F+ F+ = Frepeat for each FD f in F+ apply reflexivity and augmentation rules on f add the resulting functional dependencies to F+ for each pair of FD f1 and f2 in F+ if f1 and f2 can be combined using transitivity then add the resulting FD to F+until F+ does not change any further

Example • R = (A, B, C, G, H, I)F = { A  B, A  C, CG  H, CG  I, B  H} • some members of F+ • A  H • by transitivity from A  B and B  H • AG  I • by augmenting A  C with G, to get AG  CG and then transitivity with CG  I • CG  HI • from CG  H and CG  I : “union rule” can be inferred from • definition of functional dependencies, or • Augmentation of CG  I to infer CG  CGI, augmentation ofCG  H to infer CGI  HI, and then transitivity

Minimal Cover • Sets of functional dependencies may have redundant dependencies that can be inferred from the others • Eg: A  C is redundant in: {A  B, B  C, A  C} • Parts of a functional dependency may be redundant • E.g. on RHS: {A  B, B  C, A  CD} can be simplified to {A  B, B  C, A  D} • E.g. on LHS: {A  B, B  C, AC  D} can be simplified to {A  B, B  C, A  D} • Intuitively, a minimal cover of F is a “minimal” set of functional dependencies equivalent to F, with no redundant dependencies or having redundant parts of dependencies

Minimal Cover for a Set of FDs • Minimal coverG for a set of FDs F: • Closure of F = closure of G. • Right hand side of each FD in G is a single attribute. • If we modify G by deleting an FD or by deleting attributes from an FD in G, the closure changes. • Intuitively, every FD in G is needed, and as small as possiblein order to get the same closure as F. • Example: F: {A  B, ABCD  E, EF  GH, ACDF  EG} has minimal cover G: {A  B, ACD  E, EF  G, EF  H}

Extraneous Attributes • Consider a set F of functional dependencies and the functional dependency   in F. • Attribute A is extraneous in  if A   and F logically implies (F – {})  {( – A) }. • Attribute A is extraneous in  if A   and the set of functional dependencies (F – {})  {( – A)} logically implies F. • Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always implies a weaker one • Example: Given F = {A  C, AB  C } • B is extraneous in AB  C because A  C logically implies AB  C. • Example: Given F = {A  C, AB  CD} • C is extraneous in AB  CD since A  C can be inferred even after deleting C

Testing if an Attribute is Extraneous • Consider a set F of functional dependencies and the functional dependency   in F. • To test if attribute A   is extraneous in  • compute (a – {A})+ using the dependencies in F • if (a – {A})+ contains A, A is extraneous • To test if attribute A   is extraneous in  • compute + using only the dependencies in F’ = (F – {})  {( – A)}, • check that + contains A; if it does, A is extraneous

Computing a Minimal Cover • Rewrite FDs in F into G so that all FDs in G have a single attribute on the right side • For each FD in G, check each attribute in the left side to see if it can be deleted while preserving equivalence to F+. • Check each remaining FD in G to see if it can be deleted while preserving equivalence to F+.

Minimal Cover – An Example F: {A B, ABCD  E, EF  G, EF  H, ACDF  EG} • Rewrite F into G G: {A B, ABCD  E, EF  G, EF  H, ACDF  E, ACDF  G} • Since A B, B in ABCD  E can be deleted G: {A B, ACD  E, EF  G, EF  H, ACDF  E, ACDF  G} • Since A B, ACD  E, EF  G, ACDF  G is redundant • Since ACD  E, ACDF  E is redundant • No more extraneous attributes at the left side and no more redundant FDS Minimal cover: {A B, ACD  E, EF  G, EF  H}

Minimal Cover – Another Example F = {A  BC, B  C, A  B, AB  C} G = {A  B, A  C, B  C, A  B, AB  C} G = {A  B, A  C, B  C, AB  C} AB  C is redundant, G = {A  B, A  C, B  C} A  C is logically implied by A  B and B  C, A  C is redundant The minimal cover is: {A  B, B  C}

Summary • Functional dependency (FD) theory forms the base of database design theory. We will see how they are used in database design. • There are other forms dependencies, such as multivalued dependency (MVD), inclusion dependency, etc. • In practical applications, FD seems sufficient to derive good schema

Normal Forms • Returning to the issue of schema refinement, the first question to ask is whether any refinement is needed! • If a relation is in a certain normal form(BCNF, 3NF etc.), it is known that certain kinds of problems are avoided/minimized. This can be used to help us decide whether decomposingthe relation will help. • Role of FDs in detecting redundancy: • Consider a relation R with 3 attributes, ABC. • No FDs hold: There is no redundancy here. • Given A  B: Several tuples could have the same A value, and if so, they all have the same B value!

First Normal Form • Domain is atomic if its elements are considered to be indivisible units • Examples of non-atomic domains: • Set of names, composite attributes • Identification numbers like CS101 that can be broken up into parts • A relational schema R is in first normal form if the domains of all attributes of R are atomic • Non-atomic values complicate storage and encourage redundant (repeated) storage of data • We assume all relations are in first normal form

KEY dept price item Second Normal Form R: Sales ( dept, item, price) FD: dept, item  price; item  price R is in 1NF A 2NF relation is a 1NF relation whose nonprime attributes are fully and functionally dependent on the keys Sales (dept, item) Iteminfo (item, price) Anomalies? Yes. Caused by non-full functional dependency

KEY price item discount Third Normal Form Iteminfo (item, price,discount) item  price price  discount Relation R with FDs Fis in 3NFif, for all X  A in F+ • A  X, or • X contains a key for R, or • A is part of some key for R. Anomalies? Iteminfo (item, price) Discnt(price,discount) Yes. Caused by transitive dependency

What Does 3NF Achieve? • If 3NF violated by X  A, one of the following holds: • X is a subset of some key K • We store (X, A) pairs redundantly. • X is not a proper subset of any key. • There is a chain of FDs K  X  A, which means that we cannot associate an X value with a K value unless we also associate an A value with an X value. • But: even if a relation is in 3NF, these problems could arise. • e.g., Reserves SBDC, SBD is a key, if C  S, SDBC is in 3NF, but for each reservation of sailor S, same (S, C) pair is stored.

KEY Salary-bracket price status Third Normal Form – Another Example SalaryInfo(salary-bracket, status, bonus) Salary-bracket, status  bonus; bonus  status

Boyce-Codd Normal Form (BCNF) • Relationn R with FDs Fis in BCNFif, for all X  A in F+ • A  X (called a trivialFD), or • X contains a key for R. • In other words, R is in BCNF if the only non-trivial FDs that hold over R are key constraints. • No dependency in R that can be predicted using FDs alone. • If we are shown two tuples that agree upon the X value, we cannot infer the A value in one tuple different from the A value in the other. • If example relation is in BCNF, the 2 tuples must be identical (since X is a key).

Property_id Country_name Area Lot# BCNF – Examples • R = (A, B, C)F = { A  B, B  C } Is R in BCNF? Property F = { P  CAL, CL  AP, A  C }

Decomposition of a Relation Scheme • Suppose that relation R contains attributes A1 ... An. A decompositionof R consists of replacing R by two or more relations such that: • Each new relation scheme contains a subset of the attributes of R (and no attributes that do not appear in R), and • Every attribute of R appears as an attribute of one of the new relations. • Intuitively, decomposing R means we will store instances of the relation schemes produced by the decomposition, instead of instances of R. • E.g., Can decompose ENORWHinto ENORHand RW.

Example Decomposition • Decompositions should be used only when needed. • SNLRWH has FDs E  ENORWH and R  W • Second FD causes violation of 3NF; W values repeatedly associated with R values. Easiest way to fix this is to create a relation RW to store these associations, and to remove W from the main schema: • i.e., we decompose ENORWH into ENORH and RW • The information to be stored consists of ENORWH tuples. If we just store the projections of these tuples onto ENORH and RW, arethere any potential problems that we should be aware of?

Problems with Decompositions • There are three potential problems to consider: • Some queries become more expensive. • e.g., How much did sailor Joe earn? (salary = W*H) • Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation! • Fortunately, not in the ENORWH example. • Checking some dependencies may require joining the instances of the decomposed relations. • Fortunately, not in the ENORWH example. • Tradeoff: Must consider these issues vs. redundancy.

Lossless Join Decompositions • Decomposition of R into X and Y is lossless-joinw.r.t. a set of FDs F if, for every instance rthat satisfies F: • x(r) y(r) = r • It is always true that r  x(r) y(r) • In general, the other direction does not hold! If it does, the decomposition is lossless-join. • Definition extended to decomposition into 3 or more relations in a straightforward way. • It is essential that all decompositions used to deal with redundancy be lossless! (Avoids Problem (2).)

More on Lossless Join • The decomposition of R into X and Y is lossless-join wrt F if and only if the closure of F contains: • X  Y  X, or • X  Y  Y • In particular, the decomposition of R into UV and R - V is lossless-join if UV holds over R.

Losslessness Test – Case 1 Case 1: R is decomposed into R1 and R2. The decomposition is lossless if at least one of the following FDs is contained in F+. R1 R2  R1–R2 or R1 R2  R2–R1 Example: R (A, B, C) FD: A  B Decomposition #1: R1 (A, B), R2 (A, C) Decomposition #2: R1 (A, B), R2 (B, C)

Losslessness Test – Case 2 • Given a set of relations and FD • Construct a table: • columns – attributes, rows – relations • (i, j) = ‘a’ if Aj is an attribute of Ri • For each FD XY, if column X contains an ‘a’, mark column Y as ‘a’. • Repeat the above until no change • If there is at least one row with all ‘a’s, the decomposition is lossless, not otherwise.

Example (Losslessness Test , Case 2) R (A, B, C, D) R1 (A, B), R2 (B, D), R3 (A, B,C), R4 (B, C, D) FD: A  C, B  C, CD  B, C  D A  C B  C C  D

Dependency Preserving Decomposition • Consider CSJDPQV, C is key, JP  C and SD  P. • BCNF decomposition: CSJDQV and SDP • Problem: Checking JP  C requires a join! • Dependency preserving decomposition(Intuitive): • If R is decomposed into X, Y and Z, and we enforce the FDs that hold on X, on Y and on Z, then all FDs that were given to hold on R must also hold. (Avoids Problem (3).) • Projection of set of FDs F:If R is decomposed into X, ... projection of F onto X (denoted FX ) is the set of FDs U  V in F+(closure of F )such that U, V are in X.

Dependency Preserving Decompositions (Contd.) • Decomposition of R into X and Y is dependencypreservingif (FXunion FY )+ = F + • i.e., if we consider only dependencies in the closure F +that can be checked in X without considering Y, and in Y without considering X, these imply all dependencies in F +. • Important to consider F +, not F, in this definition: • ABC, A  B, B  C, C  A, decomposed into AB and BC. • Is this dependency preserving? Is C  A preserved????? • Dependency preserving does not imply lossless join: • ABC, A  B, decomposed into AB and BC. • And vice-versa! (Example?)

R1 R2 ? Dependency Preserving -- Example R (A, B, C) FD: AB  C, C  B R1 (A, C), R2 (B, C) Dependency preserving? lossless? R1 { (a1, c1), (a1, c2)} R2 { (b1, c1), (b1, c2) }

L02: Logical Database Design

L02: Logical Database Design

Presentation Transcript

Chapter 8 Physical Database Design

Relational Database Design by ER-to-Relational Mapping

Logical Fallacies

Chapter 7: Relational Database Design

Chapter 2

Chapter 2

Database Systems: Design, Implementation, and Management Tenth Edition

Database Systems: Design, Implementation, and Management Tenth Edition

Spring 2011 Instructor: Hassan Khosravi

DataBase Data Modeling Using the Entity-Relationship Model

Access Chapter 1

Chapter 8

Database Systems

Chapter 8: Relational Database Design

Software School of Hunan University 2006.09

Business Analysis ITEC-455 Spring 2010

B. Information Technology (Hons.) CMPB245: Database Design

Chapter 7: Relational Database Design

Chapter 3: SQL

Database Design Specialist

Database Management Systems