Normalization of DatabaseLecture - ANS Yong Choi School of Business CSUB
1NF Example Unnormalized Table PK
1NF Example (con’t.) Conversion to 1NF PK
Another 1NF Example PK PK
2NF Example PK PK Each arrow shows partial dependency
Example of 3NF PK: Cust_ID
Transitive dependency • All attributes are functionally dependent on Cust_ID. • Cust_ID -> Name • Cust_ID -> Salesperson • Cust_ID -> Region • However, there is a transitive dependency. • Region is functionally dependent on Salesperson. • Salesperson -> Region
Problems with Transitive dependency • A new sales person (Yong) assigned to the North region cannot be entered until a customer has been assigned to that salesperson (since a value for Cust_ID must be provided to insert a row in the relation). • If customer number 6837 is deleted from the table, we lose the information that salesperson Hernandez is assigned top the Easy region. • If sales person Smith is reassigned to the East region, several rows must be changed to reflect that fact.
Relations in 3NF Salesperson Region CustID Name CustID Salesperson Now, there are no transitive dependencies… Both relations are in 3rd NF
Boyce-Codd Normal Form (BCNF) • Special case of 3NF. • A relation is in BCNF if it’s in 3NF and there is no hidden dependencies. • Below is in 3NF but not in BCNF
BCNF Advisor is functionally dependent on Major. Don’t confuse with Transitive Dependency!
BCNF • In Physics the advisor Nasa is replaced by Einstein. This change must be made in two ( or more) rows in the table. • If we want to insert a row with the information that Choi advises in MIS. This cannot be done until at least one student majoring in MIS is assigned Choi as an advisor. • If student number 789 withdraw from school, we lose the information that Jackson advises in Music.
Conversion to BCNF Student Advisor
3NF and BCNF • In practice, most relation schemas that are in 3NF are also in BCNF. Only if a hidden dependency X -> A exists in a relation. • In general, it is best to have relation schemas in BCNF. If that is not possible, 3NF will do. However, 2NF and 1NF are not considered good relation schema designs.
4NF • A relation is in 4NF if it is already in 3NF and does not contain two multi-valued dependencies that are independent.- it’s a different meaning than not having multi-valued attributes for 1NF. e.g., Smith can cook and type. Smith speaks French, German, and Greek
4NF PK E-Name ->-> Skill E-Name ->-> Language
4NF The values for Skill and the values for Language are independent. PK PK
Faculty (A) relation Faculty (A) – normalized table
Faculty (A) relation • Each FacultyNum has a well-defined set of StudentNums. • Each FacultyNum has a well-defined set of CommitteeCodeses. • The STUDENTNUM and the COMMITTEECODE are independent of each other.
Faculty (B) relation Faculty (B)
Faculty (B) relation • Has a composite PK • FacultyNum, StudentNum , and CommitteeCode • Since there are no determinants other than the PKs, the relation is in BCNF. • Yet it does contain much redundant data that can easily lead to update anomalies because of multi-valued dependencies.
Problems with Faculty (B) relation • Changing the CommitteeCode for faculty member requires more than one change. • Suppose that a new faculty member 555 but does not yet serve on any committee. When a facultyNum 555 begins advising student 44332, there is a problem because the CommitteeCode is a part of PK. • If faculty member 444 no longer advises student 57384 and delete appropriate record from the relation, we lose the information that faculty member serves on the Housing committee (HSG).