1 / 29

Normalization

Normalization. Stanislava Armstrong http://www.cs.nott.ac.uk/~saw/teaching/G64DBS/lecture_slides.html. Last Lecture. Determine the outcome from executing each of the following statements on the Holidays table below:. SELECT Destination, COUNT(*) FROM Holiday WHERE ISNULL(Price) = TRUE

torin
Télécharger la présentation

Normalization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Normalization Stanislava Armstrong http://www.cs.nott.ac.uk/~saw/teaching/G64DBS/lecture_slides.html

  2. Last Lecture Determine the outcome from executing each of the following statements on the Holidays table below: SELECT Destination, COUNT(*) FROM Holiday WHERE ISNULL(Price) = TRUE GROUP BY Destination;

  3. Coursework • http://www.cs.nott.ac.uk/~saw/teaching/G64DBS/assessment.html • 3 parts: design, implement, use • There are 100 marks available • Electronic submission through cw submit • Due on Monday 6th Dec 2010, 5pm • Late submissions will be penalized by 5% per day • If you are working from home pay special attention to be consistent with the names of columns

  4. Normalization There are many forms of normalization: • First Normal Form • Second Normal Form • Third Normal Form • Boyce-Cod Normal Form • Fourth Normal Form • Fifth Normal Form • Domain/Key Normal Form • Sixth Normal Form

  5. Redundancy and Normalisation • Redundant data • Can be determined from other data in the database • Leads to various problems • INSERT anomalies • UPDATE anomalies • DELETE anomalies • Normalisation • Aims to reduce data redundancy • Redundancy is expressed in terms of dependencies • Normal Forms are hierarchical • Each normal form required the removal of a particular type of dependency in addition to those removed for the normal forms that comes before it.

  6. First Normal Form • In most definitions of the relational model • All data values should be atomic • This means that table entries should be single values, not sets or composite objects • A relation is said to be in first normal form (1NF) if all data values it contains are atomic • To convert to a 1NF relation, split up any non-atomic values

  7. Normalisation to 1NF

  8. Problems in 1NF • INSERT anomalies • Can't add a new kingdom without adding an Order_date for it • UPDATE anomalies • To change the king of Fairyland we have to change two rows • DELETE anomalies • If we remove the order from 2010-11-14 we will remove Trolland as well

  9. Functional Dependencies • Redundancy is often caused by a functional dependency • A functional dependency (FD) is a link between two sets of attributes in a relation • We can normalise a relation by removing undesirable FDs • A set of attributes, A, functionally determines another set, B, or: there exists a functional dependency between A and B (A  B), if whenever two rows of the relation have the same values for all the attributes in A, then they also have the same values for all the attributes in B.

  10. Properties of FDs • In any relation • The primary key FDs any set of attributes in the relation K X • K is the primary key, X is a set of attributes • Same for candidate keys • Any set of attributes is a FD on itself X X • Rules for FDs (Just for fun:) • Reflexivity: If B is a subset of A then A  B • Augmentation: If A  B then A U C  B U C • Transitivity: If A  B and B  C then A  C

  11. FDs and Normalisation • We define a set of 'normal forms' • Each normal form has fewer FDs than the last • Since FDs represent redundancy, each normal form has less redundancy than the last • Not all FDs cause a problem • We identify various sorts of FD that do • Each normal form removes a type of FD that is a problem • We will also need a way to remove FDs

  12. Representation Functional dependencies can also be represented in the following way: • {Execution_num}  {Execution_date} • {Kingdom}  {King} • Diagrammatically functional dependencies are represented by arrows which start from the set of columns, which functionally determines the set of columns where the arrow ends • The candidate keys in the relation are represented by a box which is drawn around them • Trivial FD are not represented on the diagram • FDs dependent on the whole candidate key are not represented on the diagram

  13. Partial FDs and 2NF • Partial FDs: • A FD, A  B is a partial FD, if some attribute of A can be removed and the FD still holds • Formally, there is some proper subset of A, C  A, such that C  B • Let us call attributes which are part of some candidate key, key attributes, and the rest non-key attributes. Second normal form: • A relation is in second normal form (2NF) if it is in 1NF and no non-key attribute is partially dependent on a candidate key • In other words, no C  B where C is a strict subset of a candidate key and B is a non-key attribute. • Normalizing to 2NF can be achieved by splitting the relation to two relations, where: • Given that the original relation is not in 2NF due to FD A  B • The first new relation will contain all of the columns contained in A and from B • The second relation will contain all of the columns which are in the original relation, but are not contained in eitherA or B, and the columns contained in A

  14. Example

  15. Problems Resolved in 2NF • Problems in 1NF • INSERT – Can't add a new kingdom without adding an Order_date for it • UPDATE – To change the king of Fairyland we have to change two rows • DELETE – If we remove the order from 2010-11-14 we will remove Trolland as well • In 2NF we have resolved all of those problems, but we have some other problems remaining

  16. Problems Remaining in 2NF • INSERT anomalies • Can’t assign a date to an execution number that is not yet assigned to an order • UPDATE anomalies • To change the Execution_date for execution 1 we have to change two rows

  17. Transitive FDs and 3NF • Transitive FDs: • A FD, A  C is a transitive FD, if there is some set B such that A  B and B  C are non-trivial FDs • A  B non-trivial means: B is not a subset of A • We have A  B  C • Third normal form • A relation is in third normal form (3NF) if it is in 2NF and no non-key attribute is transitively dependent on a candidate key • To move a relationship from 2NF to 3NF: • Given the transitive FD A  B  C • We split the relation into two new relations • The first contains all of the columns contained in B and C • The second contains all of the columns which are not contained in A, B or C and the columns contained in A and B

  18. Example

  19. Problems Resolved in 3NF • Problems in 2NF • INSERT – Can’t assign a date to an execution number that is not yet assigned to an order • UPDATE – To change the Execution_date for execution 1 we have to change two rows • In 3NF all of these are resolved (for this relation – but 3NF can still have anomalies!)

  20. Problems Remaining in 3NF INSERT anomalies You can’t add melon without adding a kingdom UPDATE anomalies Changing the execution_num for the Fairlyland order, means changing two rows DELETE anomalies Deleting Fairlyland deletes apple

  21. Boyce-Codd Normal Form A relation is in Boyce-Codd normal form (BCNF) if for every FD A  B either B is contained in A (the FD is trivial), or A contains a candidate key of the relation, In other words: every determinant in a non-trivial dependency is a (super) key. The same as 3NF except in 3NF we only worry about non-key Bs If there is only one candidate key then 3NF and BCNF are the same

  22. Example

  23. Decomposition Properties Lossless: Data should not be lost or created when splitting up relations Dependency preservation: It is desirable that FDs are preserved when splitting up relations Normalisation to 3NF is always lossless and dependency preserving Normalisation to BCNF is lossless, but may not preserve all dependencies

  24. Normalisation and Design • Normalisation is related to DB design • A database should normally be in 3NF at least • If your design leads to a non-3NF DB, then you might want to revise it • When you find you have a non-3NF DB • Identify the FDs that are causing a problem • Think whether they will lead to any insert, update, or delete anomalies • Try to remove them

  25. Normalisation Summary First normal form All data values are atomic Second normal form In 1NF plus no non-key attribute is partially dependent on a candidate key Third normal form In 2NF plus no non-key attribute depends transitively on a candidate key Boyce-Codd Normal Form In 2NF plus no attribute depends on a non super key

  26. Denormalisation Normalisation Removes data redundancy Solves INSERT, UPDATE, and DELETE anomalies This makes it easier to maintain the information in the database in a consistent state However It leads to more tables in the database Often these need to be joined back together, which is expensive to do So sometimes (not often) it is worth ‘denormalising’

  27. Denormalisation You might want to denormalise if Database speeds are unacceptable (not just a bit slow) There are going to be very few INSERTs, UPDATEs, or DELETEs There are going to be lots of SELECTs that involve the joining of tables

  28. Exercise • Given the following schema determine what normal form it is in and transform it to BCNF if possible.

  29. Reading Material • The Manga Guide to Databases – chapter 3 • Database Systems – A Practical Approach to Design, Implementation and Management by Connolly and Begg –chapters 13 and 14 • Any other book – chapter on normalization

More Related