1.04k likes | 1.33k Vues
CS 157A. Final Exam Revision 4. Prof. Sin-Min Lee Department of Computer Science. Terminology Database – an organized collection of data Table – data organized in rows and columns Attribute – a variable or item Record – a collection of attributes
E N D
CS 157A Final Exam Revision 4 Prof. Sin-Min Lee Department of Computer Science
Terminology • Database – an organized collection of data • Table – data organized in rows and columns • Attribute – a variable or item • Record – a collection of attributes • Domain – the range of values an attribute may take • Index/key – attribute(s) used to identify, organize, or order records in a database
Attribute (or item or field) Record (or tuple) integer domain real domain alpha- numeric domain (a string) Common components of a database:
Common Database Models: • Hierarchical • Network • Relational
Data organized with parent-child connections in a tree-like structure Branches group successively more similar data Advantages: Logical structure, quick searches for related items Disadvantages: Significant effort required to create the tree structure. Slow searches across branches
Data elements connected in a cross-linked structure Advantages: Quick searches, reduced (often no) duplication. Disadvantages: Significantly complex structuring – maintenance is difficult
Relational Database Model Minimal row-column structure Items/records with specified domains (possible values) Advantages: Minimum structure, easy programming, flexible Disadvantages: Relatively slow, a few restrictions on attribute content
Relational Databases Are Most Common • Flexible • Relatively easy to create and maintain • Computer speeds have overcome slow response in most applications • Low training costs • Inertia – many tools are available for RDBMS, large personnel pool
Eight Fundamental Operations Restrict (query) – subset by rows Project – subset by columns Product – all possible combinations Divide – inverse of product
Eight Fundamental Operations Union – combine top to bottom Intersect – row overlap Difference – row non-overlap Join (relate) – combine by a key column
Main Operations with Relational Tables Query / Restrict Conditional selection Calculation and Assignment Sort rank based on attributes Relate/Join Temporarily combine two tables by an index
Query / Restrict Operations with Relational Tables • Set Algebra • Uses operations less than (<), greater than • (>), equal to (=), and not equal to (<>). • Boolean Algebra • uses the conditions OR, AND, and NOT to select features. Boolean expressions are evaluated by assigning an outcome, True or False, to each condition.
Query / RestrictOperations with Relational Tables • Each record is inspected and is added to the selected set if it meets one to several conditions • AND, OR and NOT may be applied alone or in combinations • AND typically decreases the number of records selected • OR typically increases the number of records selected • NOT Is the negation operation and is interpreted as meaning select those that do not meet the condition following the NOT.
Operation Order is Important in Query • (D OR E) AND F may not be the same as D OR (E AND F) • NOT (A and B) may not be the same as [ NOT (A) AND NOT (B)] • Typically need to clarify order with delimiters
Relational Tables Relational tables have many advantages, but If improperly structured, they may suffer from: Poor performance Inconsistency Redundancy Difficult maintenance This is common because most users do not understand the concepts Normal Forms in relational tables.
Relational Tables Relational tables have many advantages, but If improperly structured, table may suffer from: Poor performance Inconsistency Redundancy Difficult maintenance This is common because most users do not understand the concepts Normal Forms in relational tables.
Problems caused by redundancy • Redundant Storage • Some information is stored repeatedly. • Update Anomalies • If one copy of such repeated data is updated, an inconsistency is created, unless all copies are similarly updated. • Insertion anomalies • It may not be possible to store certain information unless some other unrelated information is stored. • Deletion Anomalies • It may not be possible to delete certain information without losing some other unrelated information.
Redundant Storage • The rating value 8 corresponds to the hourly wage 10, and this association is repeated three times. • Update Anomalies • The hourly_wages in the first tuple could be updated without making a similar change in the second tuple.
Insertion Anomalies • We cannot insert a full tuple for an employee unless we know the hourly wage for the employee’s rating value. • Deletion Anomalies • If we delete all tuples with a given rating value (e.g. tuples of Smethurst and Guldu) we lose the association between the rating value and its hourly_wage value.
Decompositions • Intuitively, redundancy arise when a relational schema forces an association between attributes that is not natural. • Functional dependencies can be used to identify such situations and suggest refinements to the schema. • The essential idea is that many problems arising from redundancy can be addressed by replacing a relation with a collection of ‘smaller’ relations.
Functional dependency: - rating determines Hourly_wages A decomposition of a relation schema R consists of replacing the relation schema by two (or more) relation schemas, each of which contains a subset of attributes of R and which together include all attributes in R
Let R be a relation schema and let X and Y be nonempty sets of attributes in R. We say that an instance r of R satisfies the FDX Y If the following holds for every pair of tuples t1 and t2 in r If t1.X = t2.X, then t1.Y = t2.Y The notation t1.X refers to the projection of tuple t1 onto the attributes in X Functional Dependencies • A functional dependency (FD) is a kind of Integrity Constraint that generalizes the concept of a key. • An FD X Y essentially says that if two tuples agree on the values in attributes X, they must also agree on the values in attributes Y.
Tables in Non-normal Form repeat columns, “dependent” data, empty cells by design
1st Normal Forms in Relational Tables Tables are in first normal form when there are no repeat columns Advantages: easy to code queries (can look in only one column) Disadvantages: slow searches, excess storage, cumbersome maintenance
2nd Normal Forms in Relational Tables 2NF if: it is in 1NF and if every non-key attribute is functionally dependent on the primary key What is a key? An item or set of items that may be used to uniquely identify every row What is functional dependency? If you know an item (or items) for a row, then you automatically know a second set of items for the row – this means the second set of items is functionally dependent on the item (or items)
Keys Item(s) that uniquely identify a row STATE can be a key, but not REGION, SIZE, or POPULATION
Keys Item(s) that uniquely identify a row Sometimes we need >1 column to form a key, e.g., Parcel-ID and Own-ID together may form a key
Functional Dependency Knowing the value of an item (or items) means you know the values of other items in the row e.g., if we know the person’s name, then we know the address In our example, if we know the Parcel-ID, we know the Alderman, Township name, and other Township attributes: Parcel-ID - > Alderman Parcel-ID - > Thall_add Parcel-ID - > Tship-ID Parcel-ID - > Tship_name
Moving from First Normal Form (1NF to Second Normal Form (2NF), we need to: Identify functional dependencies Place in separate tables, one key per table
Normal Forms Summary No repeat columns (create new records such that there are multiple records per entry) Split the tables, so that all non-key attributes depend on a primary key. Split tables further, if there are transitive functional dependencies. This results in tables with a single, primary key per table.
if any two rows never agree on value, then • is trivially preserved. e.g course_ID course_name is not trivially preserved e.g. student_ID, course_ID course_name is trivially preserved
Normal Forms Are Good Because: It reduces total data storage Changing values in the database is easier It “insulates” information – it is easier to retain important data Many operations are easier to code
The table instance satisfies the following • student_name student_name (a trivial dependency) • student_name, course_name student_name (also trivial) there are many trivial dependencies – R.H.S. subset of L.H.S. • student_ID, course_ID (student_ID, student_name, course_ID, course_Name ) • student_ID, course_ID is a key
is a superkey for R iff R. where R is taken as the schema for relation R. • is a candidate key for R iff • R, and • for no that is a proper subset of , R. (student_ID, course_ID) is a candidate key (student_ID, course_ID, course_name) is not a candidate key
Reasoning about FDs F – a set of functional dependencies f – an individual functional dependency f is implied by F if whenever all functional dependencies in F are true, then f is true. For example, consider Workers(id, name, office, did, since) { id did, did office } implies id office
Closure of a set of FDs • The set of all FDs implied by a given set F of FDs is called the closure of F, denoted as F+. • Armstrong’s Axioms, can be applied repeatedly to infer all FDs implied by a set of FDs. Suppose X,Y, and Z are sets of attributes over a relation. (notation: XZ is X U Z) Armstrong’s Axioms • Reflexivity: if Y X, then X Y • Augmentation: if X Y, then XZ YZ • Transitivity: if X Y and Y Z, then X Z
reflexivity: student_ID, student_name student_ID student_ID, student_name student_name (trivial dependencies) augmentation: student_ID student_name implies student_ID, course_name student_name, course_name transitivity: course_ID course_nameandcourse_name department_name Implies course_ID department_name
Armstrong’s Axioms is sound and complete. • Sound: they generate only FDs in F+. • Complete: repeated application of these rules will generate all FDs in F+. • The proof of soundness is straight forward, but completeness is harder to prove.
Proof of Armstrong’s Axioms (soundness) Notation: We use t[X] for X[ t ] for any tuple t. (note that we used t.X before) Reflexivity:If Y X, then X Y Assume t1, t2 such that t1[X] = t2[X] then t1[ Y ] = t2[ Y ] since Y X Hence X Y
Augmentation: if X Y, then XZ YZ Assume t1, t2 such that t1[ XZ] = t2[ XZ] t1[Z] = t2[Z], since Z XZ ------ (1) t1[X] = t2[X], since X XZ t1[Y] = t2[Y], definition of X Y ------ (2) t1[YZ] = t2 [ YZ] from (1) and (2) Hence, XZ YZ
Transitivity: If X Y and Y Z, then X Z. Assume t1, t2 such that t1[X] = t2[X] Then t1[Y] = t2[Y], definition of X Y Hence, t1[Z] = t2[Z], definition of Y Z Therefore, X Z
Additional rules • Sometimes, it is convenient to use some additional rules while reasoning about F+. • These additional rules are not essential in the sense that their soundness can be proved using Armstrong’s Axioms. • Union: if X Y and X Z , then X YZ. • Decomposition: if X YZ, then X Y and X Z.
To show the correctness of the union rule: X Y and X Z , then X YZ ( union ) Proof: X Y… (1) ( given ) X Z… (2) ( given ) XX XY … (3) ( augmentation on (1) ) X XY … (4) ( simplify (3) ) XY ZY… (5) ( augmentation on (2) ) X ZY … (6) ( transitivity on (4) and (5) )
To show the correctness of the decomposition rule: if X YZ , then X Y and X Z (decomposition) Proof: X YZ … (1) ( given ) YZ Y… (2) ( reflexivity ) X Y … (3) ( transitivity on (1), (2) ) YZ Z… (4) ( reflexivity ) X Z … (5) ( transitivity on (1), (4) )
Note that A, B, C, are attributes We refer to the set {A,B} simply as AB Using reflexivity, we can generate all trivial dependencies R = ( A, B, C ) F = { A B, B C } F+ = { A A, B B, C C, AB AB, BC BC, AC AC, ABC ABC, AB A, AB B, BC B, BC C, AC A, AC C, ABC AB, ABC BC, ABC AC, ABC A, ABC B, ABC C, A B, … (1) ( given ) B C, … (2) ( given ) A C, … (3) ( transitivity on (1) and (2) ) AC BC, … (4) ( augmentation on (1) ) AC B, … (5) ( decomposition on (4) ) A AB, … (6) ( augmentation on (1) ) AB AC, AB C, B BC, A AC, AB BC, AB ABC, AC ABC, A BC, A ABC }
Attribute Closure • Computing the closure of a set of FDs can be expensive • In many cases, we just want to check if a given FD X Y is in F+. X - a set of attributes F - a set of functional dependencies X+ - closure ofX under F set of attributes functionally determined by X under F.
Example: F = { A B, B C } A+ = ABC ….. A X where X ABC B+ = BC C+ = C AB+ = ABC
Algorithm to compute closure of attributes X+ under F closure := X ; Repeat for eachU VinFdo begin ifU closure thenclosure := closure V ; end Until (there is no change in closure)