1 / 104

Final Exam Revision 4

CS 157A. Final Exam Revision 4. Prof. Sin-Min Lee Department of Computer Science. Terminology Database – an organized collection of data Table – data organized in rows and columns Attribute – a variable or item Record – a collection of attributes

marciano
Télécharger la présentation

Final Exam Revision 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 157A Final Exam Revision 4 Prof. Sin-Min Lee Department of Computer Science

  2. Terminology • Database – an organized collection of data • Table – data organized in rows and columns • Attribute – a variable or item • Record – a collection of attributes • Domain – the range of values an attribute may take • Index/key – attribute(s) used to identify, organize, or order records in a database

  3. Attribute (or item or field) Record (or tuple) integer domain real domain alpha- numeric domain (a string) Common components of a database:

  4. Common Database Models: • Hierarchical • Network • Relational

  5. Data organized with parent-child connections in a tree-like structure Branches group successively more similar data Advantages: Logical structure, quick searches for related items Disadvantages: Significant effort required to create the tree structure. Slow searches across branches

  6. Data elements connected in a cross-linked structure Advantages: Quick searches, reduced (often no) duplication. Disadvantages: Significantly complex structuring – maintenance is difficult

  7. Relational Database Model Minimal row-column structure Items/records with specified domains (possible values) Advantages: Minimum structure, easy programming, flexible Disadvantages: Relatively slow, a few restrictions on attribute content

  8. Relational Databases Are Most Common • Flexible • Relatively easy to create and maintain • Computer speeds have overcome slow response in most applications • Low training costs • Inertia – many tools are available for RDBMS, large personnel pool

  9. Eight Fundamental Operations Restrict (query) – subset by rows Project – subset by columns Product – all possible combinations Divide – inverse of product

  10. Eight Fundamental Operations Union – combine top to bottom Intersect – row overlap Difference – row non-overlap Join (relate) – combine by a key column

  11. Main Operations with Relational Tables Query / Restrict Conditional selection Calculation and Assignment Sort rank based on attributes Relate/Join Temporarily combine two tables by an index

  12. Query / Restrict Operations with Relational Tables • Set Algebra • Uses operations less than (<), greater than • (>), equal to (=), and not equal to (<>). • Boolean Algebra • uses the conditions OR, AND, and NOT to select features. Boolean expressions are evaluated by assigning an outcome, True or False, to each condition.

  13. Query / RestrictOperations with Relational Tables • Each record is inspected and is added to the selected set if it meets one to several conditions • AND, OR and NOT may be applied alone or in combinations • AND typically decreases the number of records selected • OR typically increases the number of records selected • NOT Is the negation operation and is interpreted as meaning select those that do not meet the condition following the NOT.

  14. Query / Restrict – simple, AND

  15. Query / Restrict – OR, NOT

  16. Operation Order is Important in Query • (D OR E) AND F may not be the same as D OR (E AND F) • NOT (A and B) may not be the same as [ NOT (A) AND NOT (B)] • Typically need to clarify order with delimiters

  17. Relational Tables Relational tables have many advantages, but If improperly structured, they may suffer from: Poor performance Inconsistency Redundancy Difficult maintenance This is common because most users do not understand the concepts Normal Forms in relational tables.

  18. Relational Tables Relational tables have many advantages, but If improperly structured, table may suffer from: Poor performance Inconsistency Redundancy Difficult maintenance This is common because most users do not understand the concepts Normal Forms in relational tables.

  19. Problems caused by redundancy • Redundant Storage • Some information is stored repeatedly. • Update Anomalies • If one copy of such repeated data is updated, an inconsistency is created, unless all copies are similarly updated. • Insertion anomalies • It may not be possible to store certain information unless some other unrelated information is stored. • Deletion Anomalies • It may not be possible to delete certain information without losing some other unrelated information.

  20. Redundant Storage • The rating value 8 corresponds to the hourly wage 10, and this association is repeated three times. • Update Anomalies • The hourly_wages in the first tuple could be updated without making a similar change in the second tuple.

  21. Insertion Anomalies • We cannot insert a full tuple for an employee unless we know the hourly wage for the employee’s rating value. • Deletion Anomalies • If we delete all tuples with a given rating value (e.g. tuples of Smethurst and Guldu) we lose the association between the rating value and its hourly_wage value.

  22. Decompositions • Intuitively, redundancy arise when a relational schema forces an association between attributes that is not natural. • Functional dependencies can be used to identify such situations and suggest refinements to the schema. • The essential idea is that many problems arising from redundancy can be addressed by replacing a relation with a collection of ‘smaller’ relations.

  23. Functional dependency: - rating determines Hourly_wages A decomposition of a relation schema R consists of replacing the relation schema by two (or more) relation schemas, each of which contains a subset of attributes of R and which together include all attributes in R

  24. Let R be a relation schema and let X and Y be nonempty sets of attributes in R. We say that an instance r of R satisfies the FDX Y If the following holds for every pair of tuples t1 and t2 in r If t1.X = t2.X, then t1.Y = t2.Y The notation t1.X refers to the projection of tuple t1 onto the attributes in X Functional Dependencies • A functional dependency (FD) is a kind of Integrity Constraint that generalizes the concept of a key. • An FD X Y essentially says that if two tuples agree on the values in attributes X, they must also agree on the values in attributes Y.

  25. Tables in Non-normal Form repeat columns, “dependent” data, empty cells by design

  26. 1st Normal Forms in Relational Tables Tables are in first normal form when there are no repeat columns Advantages: easy to code queries (can look in only one column) Disadvantages: slow searches, excess storage, cumbersome maintenance

  27. 2nd Normal Forms in Relational Tables 2NF if: it is in 1NF and if every non-key attribute is functionally dependent on the primary key What is a key? An item or set of items that may be used to uniquely identify every row What is functional dependency? If you know an item (or items) for a row, then you automatically know a second set of items for the row – this means the second set of items is functionally dependent on the item (or items)

  28. Keys Item(s) that uniquely identify a row STATE can be a key, but not REGION, SIZE, or POPULATION

  29. Keys Item(s) that uniquely identify a row Sometimes we need >1 column to form a key, e.g., Parcel-ID and Own-ID together may form a key

  30. Functional Dependency Knowing the value of an item (or items) means you know the values of other items in the row e.g., if we know the person’s name, then we know the address In our example, if we know the Parcel-ID, we know the Alderman, Township name, and other Township attributes: Parcel-ID - > Alderman Parcel-ID - > Thall_add Parcel-ID - > Tship-ID Parcel-ID - > Tship_name

  31. Moving from First Normal Form (1NF to Second Normal Form (2NF), we need to: Identify functional dependencies Place in separate tables, one key per table

  32. Normal Forms Summary No repeat columns (create new records such that there are multiple records per entry) Split the tables, so that all non-key attributes depend on a primary key. Split tables further, if there are transitive functional dependencies. This results in tables with a single, primary key per table.

  33. if any two rows never agree on  value, then •   is trivially preserved. e.g course_ID  course_name is not trivially preserved e.g. student_ID, course_ID  course_name is trivially preserved

  34. Normal Forms Are Good Because: It reduces total data storage Changing values in the database is easier It “insulates” information – it is easier to retain important data Many operations are easier to code

  35. The table instance satisfies the following • student_name  student_name (a trivial dependency) • student_name, course_name  student_name (also trivial) there are many trivial dependencies – R.H.S. subset of L.H.S. • student_ID, course_ID  (student_ID, student_name, course_ID, course_Name ) • student_ID, course_ID is a key

  36. is a superkey for R iff   R. where R is taken as the schema for relation R. •  is a candidate key for R iff •   R, and • for no  that is a proper subset of ,   R. (student_ID, course_ID) is a candidate key (student_ID, course_ID, course_name) is not a candidate key

  37. Reasoning about FDs F – a set of functional dependencies f – an individual functional dependency f is implied by F if whenever all functional dependencies in F are true, then f is true. For example, consider Workers(id, name, office, did, since) { id  did, did  office } implies id  office

  38. Closure of a set of FDs • The set of all FDs implied by a given set F of FDs is called the closure of F, denoted as F+. • Armstrong’s Axioms, can be applied repeatedly to infer all FDs implied by a set of FDs. Suppose X,Y, and Z are sets of attributes over a relation. (notation: XZ is X U Z) Armstrong’s Axioms • Reflexivity: if Y  X, then X  Y • Augmentation: if X  Y, then XZ  YZ • Transitivity: if X  Y and Y  Z, then X  Z

  39. reflexivity: student_ID, student_name  student_ID student_ID, student_name  student_name (trivial dependencies) augmentation: student_ID  student_name implies student_ID, course_name  student_name, course_name transitivity: course_ID  course_nameandcourse_name department_name Implies course_ID  department_name

  40. Armstrong’s Axioms is sound and complete. • Sound: they generate only FDs in F+. • Complete: repeated application of these rules will generate all FDs in F+. • The proof of soundness is straight forward, but completeness is harder to prove.

  41. Proof of Armstrong’s Axioms (soundness) Notation: We use t[X] for X[ t ] for any tuple t. (note that we used t.X before) Reflexivity:If Y  X, then X  Y Assume  t1, t2 such that t1[X] = t2[X] then t1[ Y ] = t2[ Y ] since Y  X Hence X  Y

  42. Augmentation: if X  Y, then XZ  YZ Assume  t1, t2 such that t1[ XZ] = t2[ XZ] t1[Z] = t2[Z], since Z XZ ------ (1) t1[X] = t2[X], since X XZ t1[Y] = t2[Y], definition of X  Y ------ (2) t1[YZ] = t2 [ YZ] from (1) and (2) Hence, XZ  YZ

  43. Transitivity: If X Y and Y  Z, then X Z. Assume  t1, t2 such that t1[X] = t2[X] Then t1[Y] = t2[Y], definition of X  Y Hence, t1[Z] = t2[Z], definition of Y Z Therefore, X Z

  44. Additional rules • Sometimes, it is convenient to use some additional rules while reasoning about F+. • These additional rules are not essential in the sense that their soundness can be proved using Armstrong’s Axioms. • Union: if X  Y and X  Z , then X  YZ. • Decomposition: if X  YZ, then X  Y and X  Z.

  45. To show the correctness of the union rule: X  Y and X  Z , then X  YZ ( union ) Proof: X  Y… (1) ( given ) X  Z… (2) ( given ) XX  XY … (3) ( augmentation on (1) ) X  XY … (4) ( simplify (3) ) XY  ZY… (5) ( augmentation on (2) ) X  ZY … (6) ( transitivity on (4) and (5) )

  46. To show the correctness of the decomposition rule: if X  YZ , then X  Y and X  Z (decomposition) Proof: X  YZ … (1) ( given ) YZ  Y… (2) ( reflexivity ) X  Y … (3) ( transitivity on (1), (2) ) YZ  Z… (4) ( reflexivity ) X  Z … (5) ( transitivity on (1), (4) )

  47. Note that A, B, C, are attributes We refer to the set {A,B} simply as AB Using reflexivity, we can generate all trivial dependencies R = ( A, B, C ) F = { A  B, B  C } F+ = { A  A, B  B, C  C, AB  AB, BC  BC, AC  AC, ABC  ABC, AB  A, AB  B, BC  B, BC  C, AC  A, AC  C, ABC  AB, ABC  BC, ABC  AC, ABC  A, ABC  B, ABC  C, A  B, … (1) ( given ) B  C, … (2) ( given ) A  C, … (3) ( transitivity on (1) and (2) ) AC  BC, … (4) ( augmentation on (1) ) AC  B, … (5) ( decomposition on (4) ) A  AB, … (6) ( augmentation on (1) ) AB  AC, AB  C, B  BC, A  AC, AB  BC, AB  ABC, AC  ABC, A  BC, A  ABC }

  48. Attribute Closure • Computing the closure of a set of FDs can be expensive • In many cases, we just want to check if a given FD X  Y is in F+. X - a set of attributes F - a set of functional dependencies X+ - closure ofX under F set of attributes functionally determined by X under F.

  49. Example: F = { A  B, B  C } A+ = ABC ….. A  X where X  ABC B+ = BC C+ = C AB+ = ABC

  50. Algorithm to compute closure of attributes X+ under F closure := X ; Repeat for eachU  VinFdo begin ifU  closure thenclosure := closure  V ; end Until (there is no change in closure)

More Related