More about Functional Dependencies and Normalization
This text covers the concepts of functional dependencies and normalization in database design. A functional dependency occurs when one attribute's value determines another's, crucial for normalization, which eliminates anomalies like deletion and insertion issues. Normalization involves organizing data into normal forms to minimize redundancy. The document explains various normal forms, from 1NF to 4NF, and illustrates how to convert relations to reduce anomalies effectively, alongside when de-normalization may be beneficial for performance.
More about Functional Dependencies and Normalization
E N D
Presentation Transcript
Functional Dependencies • A functional dependency occurs when the value of one (set of) attribute(s) determines the value of a second (set of) attribute(s) • X Y if, and only if, whenever two rows agree on X, they must necessarily agree on Y • The attribute on the left side of the functional dependency is called the determinant • SID DormName, Fee • (CustomerNumber, ItemNumber, Quantity) Price • While a primary key is always a determinant, a determinant is not necessarily a primary key • Functional dependencies are fundamental to database “normalization”
Functional Dependencies EmpName,SSN,BirthDate,Address,DeptNo,DeptName While a primary key is always a determinant, a determinant is not necessarily a primary key. Here SSN is the primary key, and it is also true that within each row, DeptNo determines DeptName. (This is not a well designed schema, as we will see…)
Normalization • Normalization eliminates modification anomalies • Deletion anomaly: deletion of a row loses information about two or more entities • Insertion anomaly: insertion of a fact in one entity cannot be done until a fact about another entity is added • Anomalies can be removed by splitting the relation into two or more relations; each with a different, single theme • However, breaking up a relation may create referential integrity constraints • Normalization works through classes of relations called normal forms
Relation schemas suffering from update anomalies: EmpName,SSN,BirthDate,Address,DeptNo,DeptName ProjectNo,SSN,Hours,EmpName,ProjName,ProjLocation Why so?
Normal Forms • Any table of data is in 1NF if it meets the definition of a relation • A relation is in 2NF if all its non-key attributes are dependent on all of the key (no partial dependencies) • If a relation has a single attribute key, it is automatically in 2NF • A relation is in 3NF if it is in 2NF and has no transitive dependencies • A relation is in BCNF if every determinant is a candidate key • A relation is in 4NF if it is in BCNF and has no multi-value dependencies
2nd Normal Form Employee_Project relation ProjectNo|SSN|Hours|EmpName|ProjName|ProjLocation A relation is in 2NF if all its non-key attributes are dependent on all of the key. So, in 2nd Normal Form becomes 3 relations: Employee Project_Hours SSN|EmpName ProjectNo|SSN|Hours Project ProjectNo|ProjName|ProjLocation
Third Normal Form: EmpName,SSN,BirthDate,Address,DeptNo,DeptName A relation is in 3NF if it is in 2NF and has no transitive dependencies. So, in 3NF becomes two relations: EmpName,SSN,BirthDate,Address,DeptNo DeptNo,DeptName
Another BCNF example:Every determinant must be a candidate key Coursework relation in 3NF: StudentID | Course | Instructor In BCNF becomes 2 relations: Instructor-Course: Student-Course: Instructor | Course Student | Instructor
A 4NF example: EmployeeName,ProjectName,DependentName In 4NF becomes 2 relations: Employee-Project: Employee-Dependent: EmployeeName,ProjectName EmployeeName,DependentName
De-normalized Designs • When a normalized design is unnatural, awkward, or results in unacceptable performance, a de-normalized design is preferred • Example • Normalized relation • CUSTOMER (CustNumber, CustName, Zip) • CODES (Zip, City, State) • De-Normalized relations • CUSTOMER (CustNumber, CustName, City, State, Zip)