EECS 647: Introduction to Database Systems

EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2007 Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus

Administrative • Sample midterm questions will be posted at the class websites later today. • You may pick up an answering key for homework 2 • PL/SQL functionality has been enabled in the EECS postgresql server. Luke Huan Univ. of Kansas

Review • Functional dependencies • X!Y: X “determines” Y • If two rows agree on X, they must agree on Y • A generalization of the key concepts y1 Luke Huan Univ. of Kansas

Review • A FD X!Y is anon-key FDif X is not a superkey • Non-key functional dependencies represent a source of redundancy y1 … Luke Huan Univ. of Kansas

Review • How to deal with non-key FD? • Decomposition • A relation R is “broken down” to a set of relations R1, R2, …, Rn • Such that: • attr(R1)  attr(R2)  …  attr(Rn) = attr(R) • And? • Lossless join • And? • Dependency preservation join (we are going to discuss this today) Luke Huan Univ. of Kansas

Review • What is a lossless join? • R = R1*R2 *… * Rn • By nature join, we can restore the data! Luke Huan Univ. of Kansas

Review • Is this a lossless join? • R = (Sid, Sname, Cid, Cname) • F = {Sid  Sname, Cid  Cname) • R1 = (Sid, Sname), R2 = (Cid, Cname) Luke Huan Univ. of Kansas

Review • How to achieve a lossless join • “foreign key” • R = R1  R2 • R1 R2  R1 • A combination of attributes in R2 that is a key of R1 • R1 R2  R2 • A combination of attributes in R1 that is a key of R2 Luke Huan Univ. of Kansas

Review • Why the foreign key mechanism guarantees a lossless join? • Well, thinking about the following relation X  Y Luke Huan Univ. of Kansas

Review • Do you have to use the “foreign key” mechanism to achieve lossless join in binary decomposition? • Yes. See EN 11.1.4 Luke Huan Univ. of Kansas

Review • How to remove the redundancy that is introduced by non-key FDs in a relation R? • Pick up a non-key FD X  Y, decompose R into R1 and R2, where • R1 has attributes X[Y • R2 has attributes X[Z, where Z = attr(R) – X – Y • Need to do this recursively? • When to stop? Luke Huan Univ. of Kansas

StudentID (email, EID) StudentGrade’ (email, Ename, PID, hours) StudentName (email, Ename) Grade (email, PID, hours) F = {EID!Ename, email; email!EID; EID, PID!hours} Exercise WorkOn (EID, Ename, email, PID, hours) Pick up email!EID pickup email!Ename Luke Huan Univ. of Kansas

Review • A relation R is in Normal Form if • For every non-trivial FD X!Y in R, X is a super key • That is, all FDs follow from “key ! other attributes” • BCNF applies binary decomposition recursively. • At each step, the lossless join property is maintained • Guarantee to remove ALL redundancies caused by FD Boyce-Codd Luke Huan Univ. of Kansas

key A X key A X key X A Three Types of non-key DF X A Partial dependency X A Transitive dependency I X A Transitive dependency II Luke Huan Univ. of Kansas

Summary • Philosophy behind BCNF (and 4NF):Data should depend on the key, the whole key, and nothing but the key! • Philosophy behind 3NF: … But not at the expense of more expensive constraint enforcement! Luke Huan Univ. of Kansas

Next • Dependency-preserving decomposition • 3NF (BCNF is too much) • Multivalued dependencies: another source of redundancy • 4NF (BCNF is not enough) Luke Huan Univ. of Kansas

Motivation for 3NF • Address (street_address, city, state, zip) • street_address, city, state!zip • zip!city, state • Keys • {street_address, city, state} • {street_address, zip} • BCNF? • Violation: zip!city, state Luke Huan Univ. of Kansas

To decompose or not to decompose Address1 (zip, city, state) Address2 (street_address, zip) • FD’s in Address1 • zip!city, state • FD’s in Address2 • None! • Hey, where is street_address, city, state!zip? • Cannot check without joining Address1 and Address2 back together • Problem: Some lossless join decomposition is not dependency-preserving • Dilemma: Should we get rid of redundancy at the expense of making constraints harder to enforce? Luke Huan Univ. of Kansas

Preserving Dependency • Projection of set of FDs F : If R is decomposed into X and Y the projection of F on X (denoted FX ) is the set of FDs U  V in F+(closure of F , not just F)such that all of the attributes U, V are in X. (same holds for Y of course) • Decomposition of R into X and Y is dependencypreserving if (FX FY ) + = F + • i.e., if we consider only dependencies in the closure F + that can be checked in X without considering Y, and in Y without considering X, these imply all dependencies in F +. Luke Huan Univ. of Kansas

An Example • Important to consider F + in this definition: • R = ABC, F = {A  B, B  C, C  A} • R is decomposed into AB and BC. • Is this dependency preserving? • Is C  A preserved????? • FAB contains A B and B  A; • FBC contains B  C and C  B • So, (FAB FBC)+ contains C  A Luke Huan Univ. of Kansas

Why Preserving FDs? • Another way to view lossless join: we lost one (or more) key FD. EID, PID  hours Luke Huan Univ. of Kansas

Dependency-preserving Decomposition • General idea: • Keep all functional dependency and at least one key during natural join! • Input: a relation R and a set of FDs F • Output: a set of join-lossless, dep.-pres. relations Step 1: for each FD X  Y, create a relation R’ = XY Step 2: if there is no key included in any of the created relations, add a relation with a key Step 3: remove all relations R’ which is a subset of some other relations • GuaranteeLossless-Join, Dep. Pres. Decomp! Luke Huan Univ. of Kansas

An Example • R = ABCDEFGH • FDs: A  B, ACD  E, EF  GH • R is decomposed into • R1 = AB, R2 = ACDE, R3 = EFGH • No key yet! • Add R4 = ACDF Luke Huan Univ. of Kansas

Another Example • Address (street_address, city, state, zip) • FDs F • street_address, city, state!zip • zip!city, state • Keys • {street_address, city, state} • {street_address, zip} • Decomposeto Address1(street_address, city, state, zip), and Address2(zip, city, state) • Address2 is part of Address1, remove Addresss2 Luke Huan Univ. of Kansas

A X A X A X 3NF • R is in Third Normal Form (3NF) if for every non-trivial FD X!A (where A is single attribute), either • X is a superkey of R, or • A is a member of at least one key of R • Intuitively, BCNF decomposition on X!A would “break” the key containing A Partial dependency Transitive dependency I Transitive dependency II 2NF 3NF BCNF 2NF  3NF  BCNF Luke Huan Univ. of Kansas

Notes about 3NF • Address (street_address, city, state, zip) • street_address, city, state!zip • zip!city, state • Is Address in 3NF? • means a set of attributes participate at least one key • Yes • Tradeoff: • Can enforce all original FD’s on individual decomposed relations • Might have some redundancy due to FD’s Luke Huan Univ. of Kansas

Minimal Cover for a Set of FDs • Minimal coverG for a set of FDs F: • Closure of F = closure of G. • Right hand side of each FD in G is a single attribute. • If we modify G by deleting an FD or by deleting attributes from an FD in G, the closure changes. • Intuitively, every FD in G is needed, and ``as small as possible’’ in order to get the same closure as F. • e.g., A  B, ABCD  E, EF  GH, ACDF  EG has the following minimal cover: • A  B, ACD  E, EF  G and EF  H

Dependency-preserving 3NF • Input: a relation R and a set of FDs F • Output: a set of join-lossless, dep.-pres. relations in 3NF Step 1: compute a minimal cover F’ of F Step 2: combine all FD X  {A1}, X  {A2} … X  {An} to a single FD X  {A1, A2 ,…, An} Step 3: for each FD X  Y, create a relation R’ = XY Step 4: if there is no key included in any of the created relations add a key Step 5: remove all relations R’ which is a subset of some other relations • Minimal Cover impliesLossless-Join, Dep. Pres. Decomp in 3NF!!! • (in EN 11.2.3, p. 389) Luke Huan Univ. of Kansas

BCNF = no redundancy? • Student (SID, CID, club) • Suppose your classes have nothing to do with the clubs you join • FD’s? • None • BCNF? • Yes • Redundancies? • Tons! Luke Huan Univ. of Kansas

Multivalued dependencies • A multivalued dependency (MVD) has the formX)Y, where X and Y are sets of attributes in a relation R • X)Y means that whenever two rows in R agree on all the attributes of X, then we can swap their Y components and get two new rows that are also in R Must in the relation also Luke Huan Univ. of Kansas

MVD examples Student (SID, CID, club) • SID)CID • SID)club • Intuition: given SID, CID and club are “independent” • SID, CID)club • Trivial: LHS [ RHS = all attributes of R • SID, CID)SID • Trivial: LHS ¶ RHS • In general when two independent 1:N relations AB, AC are mixed together in R=ABC, a MVD may occur Luke Huan Univ. of Kansas

Complete MVD + FD rules • FD reflexivity, augmentation, and transitivity • MVD complementation:If X)Y, then X)attrs(R) – X – Y • MVD augmentation:If X)Y and VµW, then XW)YV • MVD transitivity:If X)Y and Y)Z, then X )Z – Y • Replication (FD is MVD):If X!Y, then X)Y • Coalescence:If X)Y and ZµY and there is some W disjoint from Y such that W!Z, then X!Z Try proving things using these! Luke Huan Univ. of Kansas

4NF • A relation R is in Fourth Normal Form (4NF) if • For every non-trivial MVD X)Y in R, X is a superkey • That is, all FD’s and MVD’s follow from “key ! other attributes” (i.e., no MVD’s, and no FD’s besides key functional dependencies) • 4NF is stronger than BCNF • Because every FD is also a MVD Luke Huan Univ. of Kansas

4NF decomposition algorithm • Find a 4NF violation • A non-trivial MVD X)Y in R where X is not a superkey • Decompose R into R1 and R2, where • R1 has attributes X[Y • R2 has attributes X[Z (Z contains attributes not in X or Y) • Repeat until all relations are in 4NF • Almost identical to BCNF decomposition algorithm • Any decomposition on a 4NF violation is lossless Luke Huan Univ. of Kansas

4NF violation: SID)CID Enroll (SID, CID) Join (SID, club) 4NF 4NF 4NF decomposition example Student (SID, CID, club) Luke Huan Univ. of Kansas

3NF, BCNF, 4NF, and beyond • Of historical interests • 1NF: All column values must be atomic • 2NF: Slightly more relaxed than 3NF Luke Huan Univ. of Kansas

Summary • Philosophy behind BCNF, 4NF:Data should depend on the key, the whole key, and nothing but the key! • Philosophy behind 3NF: … But not at the expense of more expensive constraint enforcement! Luke Huan Univ. of Kansas

3NF • How to create join lossless decomposition? • Binary decomposition with foreign key • N-nary decomposition with FD preserving and a key • How to achieve FD preserving decompositions? • For each FD, create a relation • Minimal cover of FD guarantees 3NF Luke Huan Univ. of Kansas

EECS 647: Introduction to Database Systems

EECS 647: Introduction to Database Systems

Presentation Transcript

ICOM 5016 – Introduction to Database Systems

CS186: Introduction to Database Systems

Introduction to Database Systems

What is a Database System?

Chapter 1 Introduction to Database Systems

Introduction to Database Management

Introduction to Database Systems CSE444

Unit - 4

Introduction to Database Management Systems

Parallel Database Systems The Future of High Performance Database Systems

CIS 764 Database Systems Engineering

Chapter One (Introduction)

IMS1907 Database Systems

Lecture 11 Introduction to Relational Database

EECS 262a Advanced Topics in Computer Systems Lecture 1 Introduction/UNIX September 4 th , 2013

Introduction to Database Systems

CS 405G: Introduction to Database Systems

CS186 - Introduction to Database Systems Spring Semester 2006 Prof. Michael Franklin

Introduction to Computer Systems

Database Systems CS 311

CS 405G: Introduction to Database Systems