Normalisation

Normalisation • Normalisation is a process whereby the tables in a database are optimised to remove the potential for redundancy. Two main problems may arise if this is not done: • Repeated data makes a database bigger. • Multiple instances of the same values make maintaining the data more difficult and can create anomalies. Ms. D. Anderson

Objectives • Normalisation aims to remove this redundancy by applying rules in a series of stages, splitting tables and creating relationships between unique identifiers (keys), to ensure that the database table structure is efficient, but data can still be accurately manipulated. Ms. D. Anderson

Database Terminology • When we were looking at the physical database we use certain terms such as TABLE, COLUMN, ROW and FIELD. Discussion of database theory uses different terms: • Relation/ Entity - These are the same as a table. • Attributes - These are similar to columns in that they describe the type of data stored. • Domain - This is values within the same attribute (a collection of fields that exist in the same column). • Tuple - This is a record similar to a row. Ms. D. Anderson

Normalisation Stages • There are several stages of normalisation that a database structure can be subject to, each with rules that constrain the database further and each creating what is called a Normal Form. These are, in order: • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce Codd Normal Form (BCNF) • Fourth Normal Form (4NF) • Fifth Normal Form (5NF) Ms. D. Anderson

To demonstrate the stages of normalisation we have a example scenario • The setting is a university or college. • Essentially, a student is enrolled onto a programme and may take several modules as part of this programme. We can assume here that a module is only delivered by one lecturer. • We start with a copy of the programme Ms. D. Anderson

Typical college or university data with regard to course details Ms. D. Anderson

Normalisation rules a) A relation is said to be in first normal form (1NF) if it has no repeating groups and no multi-valued attributes. b) A relation is said to be in second normal form (2NF) if every non-key attribute depends fully on the primary key. c) A relation is said to be in third normal form (3NF) if there are no dependencies on non-key fields Ms. D. Anderson

Initial Observations • The primary key for this data would be ‘Student No’. Student ‘Name’ could also have been a primary key except for the fact that names are not always unique. All too often we find that two students can have identical names (often in the same class). ‘Student No’ is always a safer approach as the registration department has access to a much greater set of data and can allocate a unique ‘Student No’ accordingly. • The table clearly shows that the fields Module, Module name and Lecturer form a repeating group for a single value of ‘Student No’. Ms. D. Anderson

1NF – first normal form To go to 1NF, we will therefore split the table into two parts: – (Student No, Student Name, Programme, Programme Duration); – (Student No, Module No, Module Name, Lecturer). Note that ‘Student No’ is an acceptable key for the non-repeating portion of the table. The second table requires the use of a composite key this time using both ‘Student No’ and ‘Module No’ to uniquely identify a row of data. Ms. D. Anderson

1NF Tables Ms. D. Anderson

2NF – second normal form To move to second normal form, we are interested in the removal of partial dependencies. As the first table has no composite key, then it cannot contain partial dependencies and hence we can say that this is currently in 2NF: – (Student No, Student Name, Programme, Programme Duration). The second table does contain partial dependencies that need to be eliminated. In this case both ‘Module Name’ and ‘Lecturer’ are dependent on the ‘Module No’ and not the ‘Student No’. To eliminate this dependency, we can again split the second table to form: – (Student No, Module No); – (Module No, Module Name, Lecturer). Ms. D. Anderson

2NF Tables Ms. D. Anderson

3NF – third normal form To move to third normal form we must remove any transitive dependencies, i.e. fields that are dependent on non-key fields. There is one such dependency in the first relationship where Programme ‘Duration’ depends on the Programme and not on the primary key ‘Student No’. • We can therefore express 3NF as follows: – Student (Student No, Student Name, Programme); – Programme (Programme, Programme Duration); – Student-Module (Student No, Module No); – Module (Module No, Module Name, Lecturer). Ms. D. Anderson

3NF Tables Module Table Programme Table Student Table Ms. D. Anderson

References • http://www.learningmatters.co.uk/sampleChapters/pdfs/Chapter5.pdf Ms. D. Anderson

Normalisation

Normalisation

Presentation Transcript

Normalisation

Normalisation 2

Normalisation

Normalisation

NORMALISATION

Normalisation

Normalisation

Database Normalisation

Normalisation 1

Normalisation

Normalisation

Normalisation

Normalisation

Normalisation

Normalisation

NORMALISATION

Normalisation

Normalisation

Normalisation

Normalisation