1 / 21

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. Normalization. It is a process that we can use to remove design flaws from a database A number of normal forms, which are sets of rules describing what we should and should not do in our table structure

Télécharger la présentation

Schema Refinement and Normal Forms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Schema Refinement and Normal Forms CS3754 Class Notes #7, John Shieh

  2. Normalization • It is a process that we can use to remove design flaws from a database • A number of normal forms, which are sets of rules describing what we should and should not do in our table structure • 3NF is sufficient to avoid the data redundancy problem of a designed relational database CS4753/2006F

  3. Problems caused by redundancy • Redundant Storage • Some information is stored repeatedly. • Update Anomalies • If one copy of such repeated data is updated, an inconsistency is created, unless all copies are similarly updated. • Insertion anomalies • It may not be possible to store certain information unless some other, unrelated, information is stored. • Deletion Anomalies • It may not be possible to delete certain information without losing some other, unrelated, information. CS3754 Class Notes #7, John Shieh

  4. Redundant Storage • The hourly wages depend on rating levels. So, for example, hourly wage 10 for rating level 8 is repeated three times. • Update Anomalies • The hourly_wages in the first tuple could be updated without making a similar change in the second tuple. CS3754 Class Notes #7, John Shieh

  5. Insertion Anomalies • We cannot insert a tuple for an employee unless we know the hourly wage for the employee’s rating value. • Deletion Anomalies • If we delete all tuples with a given rating value (e.g. tuples of Smethurst and Guldu) we lose the association between the rating value and its hourly_wage value. CS3754 Class Notes #7, John Shieh

  6. Decompositions • Intuitively, redundancy arise when a relational schema forces an association between attributes that is not natural. • Functional dependencies can be used to identify such situations and suggest refinements to the schema. • The essential idea is that many problems arising from redundancy can be addressed by replacing a relation with a collection of ‘smaller’ relation. CS3754 Class Notes #7, John Shieh

  7. Functional dependency: - rating determines Hourly_wages A decomposition of a relation schema R consists of replacing the relation schema by two (or more) relation schemas each of which contains a subset of attributes of R and together include all attributes in R CS3754 Class Notes #7, John Shieh

  8. Functional Dependencies • A functional dependency (FD) is a kind of IC that generalizes the concept of a key. • Let R be a relation schema, and X and Y be sets of nonempty sets of attributes in R. • An FD X Y exists, if in every relation instance for R, any two tuples that agree on the value of X also agree on the value of Y. • More formally • Let R be a relation schema and let X and Y be nonempty sets of attributes in R. An FD X Y exists in R if every instance of R preserves the FD X Y. • We say that an instance r of Rpreserves the FD X Y if the following holds for every pair of tuples t1 and t2 in r If t1.X = t2.X, then t1.Y = t2.Y The notation t1.X refers to the subset of fields of tuple t1 for the attributes in X CS3754 Class Notes #7, John Shieh

  9. Take Examples: course_ID  course_name is preserved? {student_ID, course_ID}  course_name is preserved ? if no two rows agree on  value, then    is trivially preserved. yes yes CS3754 Class Notes #7, John Shieh

  10. The table instance also preserves the following • student_ID  student_name • Student_ID, course_ID  {student_name, course_name} • student_ID, course_ID  • {student_ID, student_name, course_ID, course_Name} • student_name  student_name (a trivial dependency) • student_name, course_name  student_name (also trivial) • many more …. CS3754 Class Notes #7, John Shieh

  11. How do we know if a FD exist in R? • Can we check all instances of R to see if the FD is preserved? • Definitely, not possible! • Whether or not a functional dependency exists must be determined by assumptions given in advance, or common sense, not by individual relation instances. • Given an instance r of R, we can check if r preserves some • functional dependency f, but we cannot tell if f holds over R. • course_ID  student_name ? • Although it is preserved by this table, it does not fit the assumption. no CS3754 Class Notes #7, John Shieh

  12. The assumptions given in advance, or common sense, impose some constraints, and are called the semantics of a database • Assumptions given in advance impose explicit constraints; common sense imposes implicit constraints CS3754 Class Notes #7, John Shieh

  13. Example: • Application is to keep track of information about • employees in a company. • Information to be kept track of includes: • eid: employee’s id number • ename: employee name • address: address of the employee • sex: employee’s sex • dname: name of the department that the employee works for • dhname: department head’s name • dhsex: department head’s sex CS3754 Class Notes #7, John Shieh

  14. Let’s construct a relation schema as follows: • Which of the following dependencies are true? • eid  ename • ename  eid • eid  address • eid  sex • sex  address • dhname  dname • dhname  eid • dhsex  sex Employee eid ename address sex dname dhname dhsex  Assumptions: a:Employee’s id number is unique b:Each employee has a unique address c:Each employee works for only one dept. d:A person can be the head of at most one department e:All department heads have different names Implicit: common sense        CS3754 Class Notes #7, John Shieh

  15. is a superkey for relation schema R iff   attri(R) • where attri(R) denotes the set of all the attributes in schema R •  is a candidate key (or simply, key) for R iff •   attri(R), and •  is minimal, i.e., for any   ,   attri(R) • In other words,a candidate key is a minimal superkey (student_ID, course_ID) is a candidate key (and the only one) (student_ID, course_ID, course_name) is a superkey, but not a candidate key (student_ID, course_ID, student_name) is another non-candidate superkey (student_ID, course_ID, course_name, student_name) is also a non-candidate superkey CS3754 Class Notes #7, John Shieh

  16. Normal Forms CS3754 Class Notes #7, John Shieh

  17. Normal Form (NF) • 1NF: each attribute or column value must be atomic • 2NF: if a schema is 1NF, and if its all attributes that are not part of the primary key are fully functionally dependent on the primary key • 3NF: if a schema is 2NF, and all transitive dependencies have been removed Ex: employeeDept(employeeID, name, job, deptID, deptName) has to convert to employee(employeeID, name, job, deptID) Dept(deptID, deptName) CS4753/2006F

  18. 2NF • It means that each non-key attribute must be functionally dependent on all parts of the primary key (i.e., the combination of the composite attributes of the key). • Example: not 2NF Employee(employeeID, name, job, departmentID, skill) employeeID, skill  name, job, departmentID employeeID  name, job, departmentID (Note:  determine) • Break the table into two tables to become 2NF Employee(employeeID, name, job, departmentID) employeeSkills(employeeID, skill) CS4753/2006F

  19. 3NF • Example: 2NF but not 3NF Employee(employeeID, name, job, departmentID, departmentName) Here employeeID  departmentID employeeID  departmentName Also departmentID  departmentName, departmentID is not a key Therefore, employeeID  departmentName is a transitive dependency • Convert the schema to 3NF by breaking to two tables: Employee(employeeID, name, job, departmentID) Department(departmentID, departmentName) CS4753/2006F

  20. Normal Forms Defined Informally • 1st normal form • All attributes depend on the key • 2nd normal form • All attributes depend on the whole key • 3rd normal form • All attributes depend on nothing but the key CS4753/2006F

  21. SUMMARY OF NORMAL FORMS based on Primary Keys CS4753/2006F

More Related