Introduction to Database Design Methodology

Introduction to Database Design Methodology Chapter 7 Normalisation

Learning Goals • To learn the techniques of normalisation • To improve poor table structures by normalisation up to 3NF

Table of Contents • What is Normalisation? • Stages in Normalisation • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF)

Normalisation

What is Normalisation? • Normalisation is a technique in relational database design that can reduce data redundancy and eliminate anomalies in data update. • There are several stages in normalisation, which result in different normal forms of tables. • We will introduce these normal forms later. • Remark: Normalisation may not eliminate data redundancy; instead it produces controlled redundancy that lets us properly link database tables.

Uses of Normalisation in Database Design • Normalisation can play two main roles in database design: • As an alternative approach to design table structures in relational database. In this case, we can design the table structures without constructing the ER diagram. • As an approach for checking whether the data tables are well-designed and for refining the tables. In this case, normalisation can be used in conjunction with the ER modeling.

Different Approaches in Database Design • To understand how normalisation achieves its purposes in database design, let us consider some possible approaches in database design. • Two possible approaches in database design: • Top-down approach by using ER diagram • Bottom-up approach by using normalisation

Different Approaches in Database Design Top-down approach by using ER diagram • In previous chapters, we learnt how to design a database through the following steps: • Collect user requirements • Identify the entities, attributes and relationships. Construct ER diagram for the whole system • Transform the ER diagram into data tables • Refine the tables if necessary • This approach is sometimes known as top-down approach. • Remark:We call this a top-down approach because we have to design an overall structure (ER diagram) of the system in the initial steps before constructing the table structures for the database.

Different Approaches in Database Design Top-down approach by using ER diagram • It is in the last two steps that we can apply normalisation. • In translating ER diagram into a set of tables, we can check whether the tables obtained are well-designed by referring to the conditions on the normal forms. If they are not well-designed, we can apply normalisation to refine the tables.

Different Approaches in Database Design Top-down approach by using ER diagram Collection of user requirements Construction of ER Diagram Transforming into Tables Refinement of Tables Apply normalisation here

Different Approaches in Database Design Bottom-up approach by using normalisation • We may apply normalisation to design table structures directly without constructing the ER diagram. The basic steps are: • Collect user requirements • Find out all the necessary attributes for the system • Analyse the associations between attributes. Group the attributes into tables • Refine the tables through the normalisation process:UNF 1NF 2NF  3NF … (to be introduced later) • This approach is sometimes known as bottom-up approach. • Remark:We call this a bottom-up approach because we have not designed the overall system model in the initial steps. Instead, we start from pieces of attributes and construct the basic tables for the system. We then apply a series of steps to refine the tables. The overall structure / model of the system will appear after performing the normalisation process.

Different Approaches in Database Design Bottom-up approach by using normalisation Collection of user requirements e.g. song_title, artist name, artist_website, language, year_publish, composer, author, publisher, … Find out the attributes Apply normalisation here Group attributes into Tables 1NF Refinement of Tables 2NF Refinement of Tables 3NF

Different Approaches in Database Design Bottom-up approach by using normalisation • Remarks: • The above bottom-up approach becomes difficult for a large system with a large number of attributes. It is difficult to identify all the associations between attributes for a large system. • Therefore, bottom-up approach may not be a desirable approach in database design. • We may apply this approach either for a small system, or for a small sub-system within a large system.

Stages in Normalisation

Stages in Normalisation • Normalisation is a process of successively reducing the data redundancy and anomalies in tables by decomposing the tables into smaller, well-structured ones. • Normalisation is carried out in a series of stages. Each stage results in a normal form of table structures. • Each normal form agrees with some rules about the relationships between columns within a table. • Remark: The original table(s) without going through the normalisation process is known as unnormalised form (UNF).

Stages in Normalisation • The common stages (normal forms) in normalisation are: • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce-Codd Normal Form (BCNF) • Fourth Normal Form (4NF) • Fifth Normal Form (5NF) • The steps in normalisation should be carried out in correct order. A higher order form is accomplished on the basis of the lower order ones.

Stages in Normalisation Un-normalised Form (UNF) Remove multi-valued attributes 1NF Remove partial dependencies 2NF Remove transitive dependencies 3NF Remove remaining anomalies resulting from multiple candidate keys BCNF 4NF Remove multi-valued dependencies Remove remaining anomalies 5NF

Stages in Normalisation Remarks: • Referring to the ICT Curriculum and Assessment Guide, you only need to master the first 3 normal forms (1NF to 3NF). • Structurally, higher normal forms are better than the lower ones (e.g. 2NF is better than 1NF) as they have lower data redundancy and less anomalies. • Practically, higher normal forms are NOT always more desirable. • Generally, the higher the normal form, the more tables are constructed in the database. We may need to link up more tables to retrieve a specified output. Hence the database system may respond to users’ demands more slowly.

Stages in Normalisation • To facilitate our study of the different normal forms in normalisation, we will consider the example of pop song database in Chapter 6 again (Example 1). • A brief review of the problem: We would like to develop a database to store the information of pop songs. The relevant attributes are:song_title, artist_name, artist_website, language, year_publish, composer, author, publisher, … • We may include more or less number of attributes in different stages of our discussion. • We will adopt the approach of designing the table structures directly based on normalisation, i.e. without constructing the ER diagram.

First Normal Form (1NF)

First Normal Form (1NF) A table is in first normal form (1NF) if it satisfies the following rule(s): • There are no repeating groups in the table, i.e. there are no multi-valued attributes. • In other words, each row/column intersection can contain one and only one value, not a set of values. • All the key attributes (primary keys) are defined. • All attributes are dependent on the primary key. • Remark:Some books only mention the first rule as essential in 1NF.

First Normal Form (1NF) Example • Consider the following table for storing information of pop songs. Here, a song may be sung by more than one artist. • The column artist_name contains repeating groups. • Therefore, the table is not in 1NF Song (song_id, song_title, artist_name) Repeating groups

First Normal Form (1NF) Example (con’t) • The above table is sometimes represented as Song (song_id, song_title, (artist_name) ) Repeating groups

First Normal Form (1NF) Example (con’t) • OR, together with the artist id (art_id): Song (song_id, song_title, (art_id, artist_name) ) Repeating groups

First Normal Form (1NF) Example (con’t) • There are two basic methods to convert a table with repeating groups into 1NF. • Method 1: • Put the values of the repeating groups into separate rows. • Fill in the missing data in the new rows. • Insert additional column (attribute) to obtain the primary key again, if necessary.

First Normal Form (1NF) Example (con’t) • Basically, the resulting table from Method 1 is in 1NF. • However, it has many redundant data. Song (song_id, song_title, art_id, artist_name) Redundant data

First Normal Form (1NF) Example (con’t) • A better method to convert a table with repeating groups into 1NF is to decompose the table into smaller ones as follows. • Method 2: • Create a separate table to hold the column of repeating groups. Insert suitable primary key in the new table. • Remove the column with repeating groups from the original table. • Add suitable foreign key in the new table to link up with the original table. • Remark:Method 2 is a continuation of Method 1.

First Normal Form (1NF) Song (song_id, song_title) Example (con’t) Song (song_id, song_title, art_id, artist_name) Artist (song_id, art_id, artist_name)

First Normal Form (1NF) • The above steps in converting a table with repeating groups into 1NF can be summarized as follows: Song (song_id, song_title, (art_id, artist_name) ) Song (song_id, song_title) Artist (song_id,art_id, artist_name) Repeating groups

First Normal Form (1NF) Remarks: • General steps in converting table with repeating groups into 1NF Table ( Key1, ......, (Key2, ……), (Key3, …..) ) Table1 (Key1, ....) Table2 (Key1, Key2, ....) Table3 (Key1, Key3, ....) Repeating groups Repeating groups 1NF

First Normal Form (1NF) Remarks: • For nested repeating groups, Nested repeating groups Table ( Key1, ......, (Key2, ….., (Key3, …..) ) ) Table1 (Key1, ....) Table2 (Key1, Key2, ....) Table3 (Key1, Key2,Key3, ....) 1NF

Activity 1 Tasks 1 and 2

Tasks 1 and 2 Complete Tasks 1 and 2 in Activity 1 of Chapter 7.

Second Normal Form (2NF)

Problems with First Normal Form • Let us consider the 1NF of the pop song database again, with the artist website attribute (art_web) included: UNF: Song (song_id, song_title, (art_id, artist_name, art_web) ) 1NF: Song (song_id, song_title)Artist (song_id,art_id, artist_name, art_web) Artist (song_id, art_id, artist_name, art_web) Song (song_id, song_title)

Problems with First Normal Form Questions: • Are there any redundant data in the above tables? If yes, give an example. • Will the following anomalies occur in data update? If yes, give an example. • Modification • Insertion • Deletion

Problems with First Normal Form • The above shortcomings of the 1NF occur because there are some “partial dependencies” between the non-key attributes and the primary key in the Artist table. Artist (song_id, art_id, artist_name, art_web) Partial dependencies Partial dependencies

Problems with First Normal Form • In order to reduce the above data redundancy and to avoid anomalies, we should reduce the partial dependencies between the non-key attributes and the primary key in the data table. • This is the idea of second normal form (2NF) • Before we go on to introduce the 2NF, we should first understand the concept of functional dependency and partial dependency. • We are now going to introduce the formal definitions of the above dependencies.

Functional Dependency • Attribute B is (functionally) dependent on attribute A if each value of A determines exactly one value of B. (Written as A  B)E.g. student_id  student_name • A, B  C means that C is functionally dependent on A and B, i.e. A and B together determines C. E.g. class, class_no  student_name • A  B, C means that A determines B and C.E.g. student_id  student_name, dob

Functional Dependency Exercise • For the “pop song database”, which of the following dependencies are true? • song_id  song_title • song_id  artist_name • artist_id  artist_website • composer  song_title • author  composer

Full and Partial Dependency • For A, B  C, i.e. C is functionally dependent on A and B, • C is said to be fully (functionally) dependent on A and B if C is not functionally dependent on A, and C is not functionally dependent on B. E.g. For class, class_no  student_name,student_name is fully dependent on class and class_no • C is said to be partially dependent on A and B if C is not fully functionally dependent on A and B. • In general, for A  B (B is functionally dependent on A), where A is a set of attributes, • B is said to be fully (functionally) dependent on A if B is not functionally dependent on any proper subset of A. • B is said to be partially dependent on A if B is not fully functionally dependent on A.

Second Normal Form (2NF) A table is in second normal form (2NF) if it satisfies the following rule(s): • It is in 1NF • It includes no partial dependencies on the primary key; i.e. no attribute is (functionally) dependent on only part of the primary key. • Remark: • Partial dependencies on primary key only occur in table with composite primary key (primary key with two or more attributes). • A 1NF table with a single column primary key is automatically in at least 2NF.

Second Normal Form (2NF) To convert a 1NF table with partial dependencies on primary key into 2NF, we can apply the following steps: • Identify each attribute in the composite primary key on which other non-key attributes are dependent. • Create a new table for each attribute identified in step 1. The attribute will become the primary key of the new table. • Move the non-key attributes that are dependent on this primary key attribute from the old table to the new table.

Second Normal Form (2NF) Example • Consider the Artist table in previous Example again. • Clearly the table has partial dependencies. Partial dependencies Artist (song_id, art_id, artist_name, art_web) Partial dependencies

Second Normal Form (2NF) Example (con’t) • Transform the Artist table into 2NF by decomposing the table. Partial dependencies Artist ( song_id, art_id, artist_name, art_web ) Artist (art_id, artist_name, art_web) Song_Artist (song_id, art_id) Tables in 2NF • Remark:Here, we rename the resulting tables to make their names more meaningful.

Second Normal Form (2NF) Song_Artist Example (con’t) Partial dependencies Artist Table in 1NF Tables in 2NF

Activity 1 Tasks 3 and 4

Tasks 3 and 4 Complete Tasks 3 and 4 in Activity 1 of Chapter 7.

Third Normal Form (3NF)

Problems with Second Normal Form • Let us consider the 2NF of the pop song database again. This time we include the attributes: composer name (com_name), composer website (com_web), together with the composer id (com_id). • Here, we assume that each song is created by one composer only. • The tables in 2NF are: Song (song_id, song_title, com_id, com_name, com_web) Artist (art_id, artist_name, art_web) Song_Artist (song_id, art_id)

Introduction to Database Design Methodology