Chapter 5Normalization of Database Tables Database Systems: Design, Implementation, and Management Peter Rob & Carlos Coronel
In this chapter, you will learn: • What normalization is and what role it plays in database design • About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF • How normal forms can be transformed from lower normal forms to higher normal forms • That normalization and E-R modeling are used concurrently to produce a good database design • That some situations require denormalization to generate information efficiently
Database Tables and Normalization • Normalization • Process for evaluating and correcting table structures to minimize data redundancies • process for assigning attributes to tables. It • reduces data redundancies • helps eliminate data anomalies. • produces controlled redundancies to link tables • Normalization works through a series of stages called normal forms: • First normal form (1NF) • Second normal form (2NF) • Third normal form (3NF) • Fourth normal form (4NF)
Database Tables and Normalization • Normalization • 2NF is better than 1NF; • 3NF is better than 2NF • For most business database design purposes, 3NF is as high as we need to go in normalization process • The highest level of normalization is not always most desirable.
Database Tables and Normalization • The Need for Normalization • Case of a Construction Company • Building project -- Project number, Name, Employees assigned to the project. • Employee -- Employee number, Name, Job classification • The company charges its clients by billing the hours spent on each project. The hourly billing rate is dependent on the employee’s position. • Periodically, a report is generated. Table 5.1 • The easiest way to generate the required report might seem to be a table whose contents correspond to the reporting requirements. Figure 5.1
Database Tables and Normalization • Need for Normalization:Problems with the Figure 5.1 • The project number is intended to be a primary key, but it contains nulls. • The table displays data redundancies. • The table entries invitedata inconsistencies. • The data redundancies yield the following anomalies: • Update anomalies. (modify JOB_CLASS for Employee 105) • Insertion anomalies. (a new Employee not yet assigned) • Deletion anomalies. ( Employee 103 quits)
Database Tables and Normalization • Conversion to 1NF • Repeating groups – a group of multiple entries can exist for any single key attribute occurrence. • Repeating groups must be eliminatedAny project number (PROJ_NUM) can have a group of several data entries. • A relational table must not contain repeating groups.
Step 1: Eliminate the Repeating Groups – Repeating groups can be eliminated by adding the appropriate entry in at least the primary key column(s). • Step 2: Identify the Primary Key • Uniquely identifies attribute values (rows) • Combination of PROJ_NUM and EMP_NUM
Database Tables and Normalization • Step 3: Identify all Dependencies • Dependency Diagram • The primary key components are bold, underlined, and shaded in a different color. • The arrows above entities indicate all desirable dependencies ( dependencies based on PK ) • The arrows below the dependency diagram indicate less desirable dependencies – • partial dependencies ( dependencies based on only a part of PK ) • transitive dependencies ( nonprime attribute → nonprime attribute ) • Prime attribute = Key attribute • Nonprime attribute = Nonkey attribute
Database Tables and Normalization • EMP_NUM → EMP_NAME, JOB_CLASS_, CHG_HOUR • PROJ_NUM → PROJ_NAME • JOB_CLASS → CHG_HOUR
Database Tables and Normalization • 1NF Definition • The term first normal form (1NF) describes the tabular format in which: • All the key attributes are defined. • There are no repeating groups in the table. Each row/col intersection can contain one and only one value, not set of values. • All attributes are dependent on the primary key. • All relational tables satisfy the 1NF requirements. • 1NF Drawback • Partial dependencies(EMP_NUM → EMP_NAME, JOB_CLASS_, CHG_HOUR)→→ data redundancies→→ data anomalies
Database Tables and Normalization • Conversion to 2NF • Step 1: Identify All Key Components • Writing each key component on a separate line, and then • writing the original key on the last line and PROJ_NUM EMP_NUM PROJ_NUM, EMP_NUM • Step 2: Identify the Dependent Attributes • Writing the dependent attributes after each new key. PROJECT ( PROJ_NUM,PROJ_NAME) EMPLOYEE ( EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN ( PROJ_NUM, EMP_NUM, HOURS)
Database Tables and Normalization • 2NF Definition • A table is in 2NF if: • It is in 1NF and • It includes nopartial dependencies; that is, no attribute is dependent on only portion of primary key. • A table whose primary key is notcompositemust automatically be in 2NF.BECAUSE a partial dependency can exist only if a table has a composite primary key • It is still possible for a table in 2NF to exhibit transitive dependency; that is, one or more attributes may be functionally dependent on nonkey attributes. • 2NF Drawback • Transitive dependencies(JOB_CLASS → CHG_HOUR)→→ data redundancies→→ data anomalies
Database Tables and Normalization • Conversion to 3NF • Create a separate table with attributes in a transitive functional dependence relationship. • Step 1: Identify Each New Determinant JOB_CLASS • Step 2: Identify the Dependent Attributes JOB_CLASS → CHG_HOUR • Step 3: Remove the Dependent Attributes from Transitive Dependencies EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR) PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
Database Tables and Normalization • 3NF Definition • A table is in 3NF if: • It is in 2NF and • It contains notransitive dependencies.
Improving the Design • Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies • Normalizationcannot, by itself, be relied on to make good designs • It is valuable because its use helps eliminate data redundancies
Improving the Design • Issues to address in order to produce a good normalized set of tables: • Evaluate PK Assignments • Evaluate Naming Conventions • Refine Attribute Atomicity • Identify New Attributes • Identify New Relationships • Refine Primary Keys as Required for Data Granularity • Maintain Historical Accuracy • Evaluate Using Derived Attributes
Improving the Design • Adding relationships • Project’s manager • EMP_NUM as a FK in PROJECT. Project manager 3NF
Improving the Design • PK assignment • JOB_CODE • Naming conventions • JOB_CHG_HOUR • JOB_CLASS >> JOB_DESCRIPTION 2NF
Improving the Design 3NF • Attribute atomicity • EMP_NAME >> EMP_LNAME,EMP_FNAME,EMP_INITIAL • Adding attributes • EMP_HIREDATE JOB_CLASS
Improving the Design 3NF • Refining PKs • (EMP_NUM+PROJ_NUM) >> ASSIGN_NUM • Maintaining historical accuracy • ASSIGN_CHG_HOUR <<>> JOB_CHG_HOUR • Using derived attributes • ASSIGN_CHARGE = ASSIGN_HOURS × ASSIGN_CHG_HOUR
Surrogate Key Considerations • When primary key is considered to be unsuitable, designers use system-defined surrogate keys • The DBMS can be used to have the system assign the PK values (JOB_CODE)>> to ensure entity integrity
Limitations on system-defined surrogate keys • Data entries in Table 5.3 are inappropriatebecause they duplicate existing records • However, it does not prevent us from making the entries shown in Table 5.3. >> Multiple duplicate records problem • We still must ensure the uniqueness in JOB_DESCRIPTION through the use of a unique index.
Database Tables and Normalization • Boyce-Codd Normal Form (BCNF) • A table is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidatekey. (A determinant is any attribute whose value determines other values with a row.) • If a table contains only one candidate key, the 3NF and the BCNF are equivalent. • BCNF can be violatedonly if the table contains more than one candidate key • BCNF is a special case of 3NF. • Figure 5.7 illustrates a table that is in 3NF but not in BCNF. • Figure 5.8 shows how the table can be decomposed to conform to the BCNF form.
A Table That Is In 3NF But Not In BCNF • A + B → C, D • C → B : Nottransitive dependencies (A nonkey attribute is the determinant of a key attribute) → → 3NF • C : Notcandidatekey → → Not In BCNF
The Decomposition of a Table Structure to meet BCNF Requirements • A + B → C, D • C → B Change the PK to A+C • A + C → B, D • C → B • A + C → D • C → B
The Boyce-Codd Normal Form (BCNF) • STU_ID + STAFF_ID → CLASS_CODE, ENROLL_GRADE • CLASS_CODE → STAFF_ID
Decomposition into BCNF Figure 5.9
Normalization and Database Design • Normalization should be part of the design process • E-R Diagram provides macro view • Normalization provides micro view of entities • Focuses on characteristics of specific entities • A micro view of the entities within the ER diagram • Difficult to separate normalization from E-R diagramming • Two techniques should be used concurrently
Normalization and Database Design • Database Design and Normalization Example:(Construction Company) • Summary of Operations: • The company manages many projects. • Each project requires the services of many employees. • An employee may be assigned to several different projects. • Some employees are not assigned to a project and perform duties not specifically related to a project. • Some employees are part of a labor pool, to be shared by allproject teams. • Each employee has a (single) primary job classification. This job classification determines the hourly billing rate. • Many employees can have the same job classification.
Normalization and Database Design • Two Initial Entities: • PROJECT (PROJ_NUM, PROJ_NAME) [3NF ] • EMPLOYEE ( EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_DESCRIPTION, JOB_CHG_HOUR) No partial dep. Transitive dep. JOB_DESCRIPTION → JOB_CHG_HOUR [2NF ]
Normalization and Database Design • Three Entities After Transitive DependencyRemoved PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE ( EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_CODE) JOB ( JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR) EMPLOYEE
The Modified ERD For A Contracting Company Because the normalization process yields an additional entity (JOB), we modify the initial ERD. M Is held by
Normalization and Database Design • Creation of the Composite EntityASSIGNMENT ASSIGNMENT Is held by The Final ( Implementable) ERD for the Contracting Company
Normalization and Database Design • Attribute ASSIGN_HOUR is assigned to the composite entity ASSIGN. • “Manages” relationship is created between EMPLOYEE and PROJECT. PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM) ASSIGNMENT(ASSIGN_NUM, ASSIGN_DATE , ASSIGN_HOURS, ASSIGN_CHG_HOUR , ASSIGN_CHARGE,EMP_NUM, PROJ_NUM) EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, EMP_HIREDATE, JOB_CODE) JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR) Manages
Higher-Level Normal Forms • In some databases, multiple multivalued attributes exist • 4NF Definition • A table is in 4NFif it is in 3NF and has no multiple sets of multivalued dependencies.
Higher-Level Normal Forms • An employee can have multiple assignments and can also be involved in multiple service organization. • EMP_SERVICE (volunteer work) and EMP_ASSIGN (assigned project)each may have many different values. • The table contain two sets of multivalued dependencies. Independent 1
A Set of Tables in 4NF • The solution is to eliminate the problems caused by independentmultivalued dependencies.