1 / 36

Normalisation

Normalisation. 5. Outline. Boyce-Codd Normal Form (BCNF) normalisation non-loss decomposition Heath’s theorem normalisation process semantic assumptions and FDs CKs decomposition normalisation vs dependency preservation a decomposition may yield to a better solution than another one

Patman
Télécharger la présentation

Normalisation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Normalisation 5

  2. Outline • Boyce-Codd Normal Form (BCNF) • normalisation • non-loss decomposition • Heath’s theorem • normalisation process • semantic assumptions and FDs • CKs • decomposition • normalisation vs dependency preservation • a decomposition may yield to a better solution than another one • either-or situations: normalise or preserve FDs

  3. 1

  4. 2NF and 3NF • optional • 2NF • a relation is in 2NF if and only if it is in 1NF and all non-key attributes are irreducibly dependent on the candidate keys • 3NF (Zaniolo) • R is a relation; X is any set of attributes of R; A is any single attribute of R; consider the following conditions: • X contains A • X contains a candidate key of R • A is contained in a candidate key of R • if either of the three is true for every FD X  A then R is in 3NF

  5. BCNF • a relation is in Boyce/Codd normal form (BCNF) if and only if every non-trivial irreducible FD has a candidate key as its determinant • informally • the determinant of each relevant FD is a CK

  6. Example • devise examples in class • relations in BCNF • relations not in BCNF

  7. BCNF • any relation can be non-loss decomposed into an equivalent set of BCNF relations • BCNF  3NF  2NF  1NF • BCNF is still not guaranteed to be free of any update anomalies

  8. 2

  9. Normalisation • the process of transforming a relation with redundancies into an “equivalent” set of relations that have less redundancies • “transformation”  projection • input :: one relation, say R • output :: many relations, say R1, …, Rn • “equivalent”  non-loss decomposition • R1 join R2 … join Rn = R • R1, …, Rn should have normal forms higher than or equal to that of R

  10. Non-loss decomposition • (Patient, Symptom, Doctor, Office, Diagnosis) • semantic assumptions • exercise

  11. Lossy Decomposition • (Patient, Symptom, Doctor, Office, Diagnosis) • semantic assumptions • exercise

  12. Heath’s theorem • can be used as the basis for normalisation • theorem • suppose • R = (A, B, C), where A, B and C are disjoint sets of attributes • AB • then • R = (A, B) join (A, C) • state “in English”

  13. Normalisation – rules of thumb • take as basis for normalisation/Heath’s theorem a “problem” FD • maximise B when applying Heath’s theorem, on the basis of AB • try to maintain a one-to-one correspondence with real life entities

  14. Normalisation • steps • semantic assumptions • FDs • CKs • decomposition

  15. Simple example • (M_id, M_name, Type, Value) • M_id  M_name • M_id  Type • M_id  Value • Type  Value • not BCNF • Heath’s theorem for Type  Value results • (Type, Value) • (M_id, M_name, Type) • both relations are now in BCNF

  16. 3

  17. Example (R) (project, task, max-budget, duration, payment-rate, contractor, contr-time) FDs: (project, task)  max_budget, duration (task, max_budget, duration)  payment_rate (project, task, contractor)  contr_time (project, task, max-budget, duration, payment-rate, contractor, contr-time)

  18. Example – decomposition for R • Heath’s theorem for R (the initial relation) based on • task, max_budget, duration  payment_rate • leads to: • R1 (task, max_budget, duration, payment_rate) • R2 (project, task, max_budget, duration, contractor, contr_time) • R1 is in BCNF • R2 is not in BCNF, due to • project, task  max_budget, duration

  19. Example – decomposition for R2 • Heath’s theorem for R2, based on • project, task  max_budget, duration • leads to • R21 (project, task, max_budget, duration) • R22 (project, task, contractor, contracted_time) • R21 is in BCNF • R22 is in BCNF

  20. Example – solution • (task, max_budget, duration, payment_rate) • (project, task, max_budget, duration) • (project, task, contractor, contracted_time)

  21. 4

  22. Decomposition – 2 or more solutions • in the normalisation process, it may be possible that a certain (non-loss) decomposition yields to a better solution than another one

  23. Decomposition – 2 solutions – example • Modules(M_id, M_name, Type, Value) • solution #1 • Modules_Descr(M_id, M_name, Type) • Type_Val(Type, Val) • solution #2 • Modules_Descr(M_id, M_name, Type) • Module_Val(M_id, Val) • are they both non-loss? (apply Heath’s theorem) • is there one better than the other?

  24. Solution #1 vs Solution #2 • updates • u1: insert the fact that a 3 semester module is worth 1.5cu • u2: modify 1 semester modules; they are not worth 0.5cu any longer, they are 0.75cu • u3: change the type of a module but forget to change its value • solution #2 • u1 and u2 are impossible or very difficult to perform • u3 is allowed • solution #1 • u1 and u2 are straightforward • u3 is not allowed

  25. Solution #1 vs Solution #2 • solution #1 • more expressive • certain facts cannot be expressed in solution #2; e.g. the value of a new type • updates can be independently performed on the two component relations (i.e. all constraints are properly expressed) • in solution #2: Type  Value is lost, so this constraint must be enforced by the user by procedural code • independent projections • updates can be performed independently on each projection, without the danger of ending with inconsistent data

  26. M-id Type M_name M-id Type M_name Value M_id Value Type Independent projections M_name M-id Type Value Solution #1 Solution #2 one transitive : intra one direct : lost all direct : intra all transitive : inter

  27. Independent projections - Risanen • R1 and R2 are two projections of R; R1 and R2 are independent if and only if • every FD in R is a logical consequence of the FDs in R1 and R2 • the common attributes of R1 and R2 for a candidate key for at least one of R1 or R2 • atomic relation • cannot be decomposed into independent projections

  28. Dependency preservation • R was decomposed (normalisation) into R1, …, Rn • S - the set of FDs for R • S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to only the attributes of Ri) • S’ = S1 …  Sn (usually, S’  S) • the decomposition is dependency preserving if S’+ = S+

  29. 5

  30. Normalisation vs dependency preservation • there are cases when there is an either-or situation regarding the normalisation and the preserving of functional dependencies: • either the relation is normalised and some FDs are lost • or, some FDs are not lost (they are expressed in the original relation), but the relation is not in its higher normal form possible • in this case, no solution is better than the other • other criteria will have to be considered to judge better

  31. Patient Doctor Disease Normalisation vs dependency preservation: Example • a patient is treated by a single doctor for a certain disease • each doctor only treats one kind of disease • a doctor can treat more than one patient • is this relation BCNF? • can you identify update anomalies? • consider also (Patient, Disease, Doctor, Treatment) • with Patient, Disease  Treatment

  32. Possible decompositions non-loss? (choose PKs) non-loss? (choose PKs) Heath’s theorem (choose PKs)

  33. BCNF vs dependency preservation and do not enforce a FD existing in the original specification, namely: e.g. a patient can be given two doctors that treat the same disease (the system will not disallow this); the constraint would have to be maintained by procedural code

  34. BCNF vs dependency preservation • not every FD is expressible through normalisation • when the relation was in its original form (3NF) • (Patient, Disease)  Doctor was expressed • a doctor could not be assigned to more than one patient-disease • Doctor  Disease was not expressed • generated update anomalies • in BCNF (decomposed) • Doctor  Disease was expressed • (Patient, Disease)  Doctor was not expressed • generated update anomalies (refer to previous slide) • this latter FD would not have been expressed even if the decomposition in all three 2-attribute relations had been considered

  35. Conclusions • normal forms : formalisation of common sense • art  engineering • possibility for automation; difficult, because of non-determinism (more than one choices at one step) • BCNF • always achievable • notalways free of update anomalies, because it cannot always express all the FDs existing in the problem • there are higher normal forms (4NF, 5NF) • defined on the basis of other concepts (not FDs)

More Related