Normalization • Normalization is a formalized procedure to eliminating redundancy from data by the progressive use of ‘non-lose decomposition’, which involves splitting records without losing information. • In reducing the data model to the state where each bit of information is only held in one place, the update process is much simpler, more efficient and inconsistencies in the database are impossible.
5NF 4NF 3NF 2NF 1NF Normalization (cont.) Redundancy Redundancy Redundancy
Normalization (cont.) • Normalization is based on the idea that an attribute may depend on another attribute in some way. • There are 2 different kinds of dependencies involved up to 5 NF • Functional dependency • Multivalued dependence
S#,P# QTY Functional Dependence QTY is functionally dependent on S#andP# S# and P# are the determinant of QTY
Functional Dependence • In a relation including attribute A and B, B is functional dependent on A if, for every valid occurrence, the value A determines the value B • An occurrence can not be used to show that a dependency is true, only that it is false • A and B can be composite • If B is ‘Functional Dependent on’ A, then A ‘is the determinant of B’ • All fields are functionally dependent on the primary key – or indeed any candidate key – be definition.
First Normal Form • A relation is in First Normal form if, and only if, it contains no multi-value or no repeating groups.
Repeat First NF (cont.)
Multi-value Problem Staff • Problem • Difficult to manipulate data • Redundancy • UPDATE ANOMALIES
P06 Speech Corpus Insert Staff We can not insert new project if the project has not assigned to any employee yet.
UPDATE ANOMALIES Staff Change ProjName from Voice Ordering to Speech Ordering need to change all in Database
DELETE Problem Staff Delete Employee E003 Somchay Project P03 Medical Image Processing was deleted also
Solution • Remove the repeating group • In case of multi-valued • Create new relation • Columns = Key + multi-valued • Take its determinant with it
PayCheck Employee Repeating group Employee
Multi-Valued Insert Project still has problem
Second Normal Form(2NF) • A relation is in first normal form if and only if • It is in 1NF • Every non-key attribute is dependent on all parts of the primary key.
KEY = ENO + ProjNo 2NF ? Staff Answer is No. Because ProjNo is dependent on ProjNo. (not all part of Key)
Problem We can not insert Project if have not yet assigned project to any employee
Solution • Remove the attribute involved • Take its determinant with it
Result PERSON_Proj Project PERSON PERSON(ENO,NAME,Dno,DeptName) PROJECT(ProjNo,ProjName) PERSON_PROJ(ENO,ProjNo)
Third Normal Form • A relation is in 3NF if, and only if: • It is in 2NF • Every non-key attribute is functionally dependent upon the key. (No non-key attribute is functional dependent on another non-key attribute) • Or non-key attribute no transitive dependent on key
Transitive dependent • R(A,B,C,D) ; A is Key, others are non- key • If A → B and B → C can say A → B → C (C transitive dependent on A)
3NF? PERSON_Proj Project PERSON Answer is No Because DeptName is dependent on Dno (has transitive dependent on key)
Solution • Remove the offending attributes • Take the determinant along
Result PERSON PERSON_Proj Project Department
Note • The third normal form is often reached in practice by inspection, in a single step. Its meaning seems intuitively clear; it represents a formalization of designer’s common sense. • This level of normalization is widely accepted as the initial target for a design which eliminates redundancy. • However, there are higher normal forms which, although less frequently invoked, highlight further redundancy problems which may affect the designer
Boyce-Codd Normal Form (BCNF) • A relation is in BCNF if, and only if, every determinant is a candidate key. • BCNF is a refinement to third normal form, and tightens its duration.
Multivalued Dependence • In a relation including attribute A, B and C, B is multivalued dependent on Aif the set of B values matching a given A+C value pair, depends only on the A value.