Relational Database Design

Relational Database Design

Relational Database Design • RDBMS design issues – Pitfalls and Normalization. • Overview of Normal Forms • Pitfalls in Relational Database Design • Functional Dependencies • Decomposition • Boyce-Codd Normal Form • Third Normal Form • Multivalued Dependencies and Fourth Normal Form • Overall Database Design Process

The Banking Schema • branch= (branch_name, branch_city, assets) • customer= (customer_id, customer_name, customer_street, customer_city) • loan = (loan_number, amount) • account= (account_number, balance) • employee = (employee_id. employee_name, telephone_number, start_date) • dependent_name = (employee_id, dname) • account_branch= (account_number, branch_name) • loan_branch = (loan_number, branch_name) • borrower = (customer_id, loan_number) • depositor= (customer_id, account_number) • cust_banker= (customer_id, employee_id, type) • works_for= (worker_employee_id, manager_employee_id) • payment= (loan_number, payment_number, payment_date, payment_amount) • savings_account= (account_number, interest_rate) • checking_account= (account_number, overdraft_amount)

Combine Schemas? • Suppose we combine borrowerand loan to get bor_loan= (customer_id, loan_number, amount ) • Result is possible repetition of information (L-100 in example below)

A Combined Schema Without Repetition • Consider combining loan_branchand loan loan_amt_br = (loan_number, amount, branch_name) • No repetition (as suggested by example below)

What About Smaller Schemas? • Suppose we had started with bor_loan.How would we know to split up (decompose) it into borrower and loan? • Write a rule “if there were a schema (loan_number, amount), then loan_number would be a candidate key” • Denote as a functional dependency: loan_numberamount • In bor_loan, because loan_number is not a candidate key, the amount of a loan may have to be repeated. This indicates the need to decompose bor_loan. • Not all decompositions are good. Suppose we decompose employee into employee1= (employee_id, employee_name) employee2= (employee_name, telephone_number, start_date) • The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is a lossy decomposition.

A Lossy Decomposition

Pitfalls in Relational Database Design • Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to • Repetition of Information. • Inability to represent certain information. • Design Goals: • Avoid redundant data • Ensure that relationships among attributes are represented • Facilitate the checking of updates for violation of database integrity constraints.

RDBMS Design issues • So far we have assumed that attributes are grouped to form a relation schema by using the common senseof database designer or by mapping a schema defined by ER model. • We still need some formal measure of why one grouping of attributes into a relation schema may be better than another. • Unsatisfactory relation schemas that do not meet certain conditions – the normal form tests – are decomposed into smaller relation schemas that meet the tests and hence possess the desirable properties. • Thus, the normalization procedure provides database designers with; • A formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes. • A series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree.

First Normal Form • A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria. • These criteria are basically concerned with ensuring that the table is a faithful representation of a relationand that it is free of repeating groups. • Some definitions of 1NF, most notably that of Edgar F. Codd, make reference to the concept of atomicity. • Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS." • Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions).“ • Meaning a field should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same field.

First Normal Form • Domain is atomic if its elements are considered to be indivisible units • Examples of non-atomic domains: • Set of names, composite attributes • Identification numbers like CS101 that can be broken up into parts • A relational schema R is in first normal form if the domains of all attributes of R are atomic • Non-atomic values complicate storage and encourage redundant (repeated) storage of data • Example: Set of accounts stored with each customer, and set of owners stored with each account • We assume all relations are in first normal form (and revisit this again!)

First Normal Form (Cont’d) • Atomicity is actually a property of how the elements of the domain are used. • Example: Strings would normally be considered indivisible • Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 • If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. • Doing so is a bad idea: leads to encoding of information in application program rather than in the database.

Goal — Devise a Theory for the Following • Decide whether a particular relation R is in “good” form. • In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that • each relation is in good form • the decomposition is a lossless-join decomposition • Our theory is based on: • functional dependencies • multivalued dependencies

Overview of Normal Forms • 1NF ( First Normal Form) To understand • 2NF • 3NF • BCNF Concept of FD’s ( Functional Dependency) required To understand • 4NF • 5NF Concept of MVD (Multi Valued Dependency) is required

Normal Forms & FDs: Review • Unnormalized– There are multivalued attributes or repeating groups • 1 NF – No composite attributes or repeating groups, • 2 NF – 1 NF plus no partial dependencies • 3 NF – 2 NF plus no transitive dependencies

Normalization • The basic objective of normalization is to reduce the various anomalies in the database. • Normalization can be looked upon as a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of ; • Minimizing redundancy • Minimizing the insertion, deletion, and update anomalies. • Unsatisfactory relation schemas that do not meet certain conditions – the normal form tests – are decomposed into smaller relation schemas that meet the tests and hence possess the desirable properties. • Thus, the normalization procedure provides database designers with; • A formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes. • A series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree.

Normalization… • The normal form of a relation refers to the highest normal form condition that it meets, and hence indicates the degree to which it has been normalized. • Normal forms when considered in isolation from other factors, do not guarantee a good database design. • It is generally not sufficient to check separately that each relation schema in the database is, say, in BCNF or 3NF. • Rather, the process of normalization through decomposition must also confirm the existence of additional properties that the relation schemas, taken together should possess; • The Lossless join, • The dependency preservation property, which ensures that each functional dependency is represented in some individual relations resulting after decomposition.

RDBMS design • RDBMS design involves checking the current design through Normal Form Test. • If the design is not in desired Normal form, then • Decompose the Relations (Tables) into smaller ones • Fulfill the properties of decomposition. • Properties of decomposition • Functional dependency preservation • Identify the FDs in the given Relation schema • Apply Armstrong Axioms to find set of all FDs (Closure)

Overview • To understand • 2NF • 3NF FD’s required • BCNF FD’s & Closure of FDs required • To understand • 4NF • 5NF • MVD is required

First Normal Form • Domain is atomic if its elements are considered to be indivisible units • Examples of non-atomic domains: • Set of names, composite attributes • Identification numbers like CS101 that can be broken up into parts • A relational schema R is in first normal form if the domains of all attributes of R are atomic • Non-atomic values complicate storage and encourage redundant (repeated) storage of data

Keys & Functional Dependency • A Key Attributes is special attribute which can uniquely identify the record for Entity • A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity. Ex. Super key : ( customer-id, Customer-name) • A candidate key of an entity set is a minimal super key • Customer-id is candidate key of customer • account-number is candidate key of account • Although several candidate keys may exist, one of the candidate keys is selected to be the primary key. • Functional Dependencies shows us Inter – relation among attributes

Functional Dependencies and Keys • A FD is a generalization of the notion of a key. • How ? • If a given set of values for each attribute in X uniquely determines each of the values of the attributes in Y X - Y • For Example in Relation schema Student (sid, name, supervisor_id, specialization), we write the functional dependency: {sid}  {name, supervisor_id, specialization} • The sid determines all attributes (i.e., the entire record) • If two tuples in the relation student have the same sid, then they must have the same values on all attributes. • In other words they must be the same tuple (since the relational modes does not allow duplicate records)

Functional Dependency Functional Dependency : • Inter – relation among attributes of an entity. • Let R be a relation on the relation scheme R, then R satisfies the functional – dependency X - Y • If a given set of values for each attribute in X uniquely determines each of the values of the attributes in Y. • ( X determines Y • Y functionally dependent on X ) • FD’s can be used to group the attributes into Relation-scheme, which is in a particular Normal forms Approach of Normalization : • Step I : Test the given Relation scheme for any Normal form • Step II : If the given R-scheme is not in any N.F. (2NF/3NF) • Step III : Decompose the Relation-scheme into small Relation-schema. • Step IV : Test the decomposed schema for Normal forms.

FDs example

Functional Dependency • Inter – relation among attributes of an entity. • Let R be a relation on the relation scheme R, then R satisfies the functional – dependency X - Y • If a given set of values for each attribute in X uniquely determines each of the values of the attributes in Y. X determines Y • Y functionally dependent on X • FDs allows us to express constraints that we can not express with Superkeys ! • FD’s can be used to group the attributes into Relation-scheme, which is in a particular Normal forms

FDs • Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. • For example, a specific instance of Loan may, by chance, satisfy amount customer_name. • Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: bor_loan = (customer_id, loan_number, amount , customer_name). We expect this functional dependency to hold: loan_numberamount but would not expect the following to hold: amount customer_name

How to identify FDs • Identification of correct FDs are key to efficient DBMS Design . • There are many commonly committed errors in identifying FDs. • R satisfies the functional – dependency X  Y • If a given set of values for each attribute in X uniquely determines each of the values of the attributes in Y. X determines Y • While identifying FDs we should ensure that each attribute in X uniquely determines and not that some tuples satisfy it !

Common pitfalls in identifying FDs • Design Goals & Normalization • Design Anomalies • IdentifyingFDs : When & How ? • Procedure of FDs identifications ? • Pitfalls in FDs identification • Limitations with Normal Forms • Decompose the Relation Scheme • FDs to be taken care of while decomposition • Dependency preservation, Lossless join.

Procedure to identify FDs How to Identify the FDs in Scheme. • We start finding any two tuples with the same X value and then Y values in these tuples must be same. • Repeat these procedure until all such pairs of tuples with the same X values are examined with Y values. • Then verify that whether R in real world scenario satisfies X  Y or not ? • Should FDs be identified only by observing the tuples/data ? • Analyse the FDs and see whether it will work ?

Common pitfalls in FDs in the Schema Whether the FDs Course  Professor Professor  Course Satisfied in the following Schema ? Why so ? • If a given set of values for each attribute in X uniquely determines each of the values of the attributes in Y. • Then verify that whether R in real world scenario satisfies Professor -> Course or not ? • Can same value for Professor attribute may have more than one values for Course in real world scenario?

DBMS Design & Normalization : • Step I : Test the given Relation scheme for any Normal form • Step II : Identify & Analyse the FDs of the given Scheme. • Step III If the given R-scheme is not in any N.F. (2NF/3NF) then Step IV. • Step IV: Decompose the Relation-scheme into small Relation-schema. • Step V : Test the decomposed schema for Normal forms.

Normalization Using Functional Dependencies • When we decompose a relation schema R with a set of functional dependencies F into R1, R2,.., Rn we want • Lossless-join decomposition: Otherwise decomposition would result in information loss. • No redundancy: The relations Ripreferably should be in either Boyce-Codd Normal Form or Third Normal Form. • Dependency preservation: Let Fibe the set of dependencies F+ that include only attributes in Ri. • Preferably the decomposition should be dependency preserving, that is, (F1 F2  …  Fn)+ = F+ • Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive.

Functional Dependencies (Cont.) • A functional dependency is trivial if it is satisfied by all instances of a relation • Example: • customer_name, loan_number customer_name • customer_name customer_name • In general,   is trivial if    • Loss less Join : so there is no information loss after decomposition • Dependency Preservation : Ensures that each functional dependency is represented in some relation after decomposition.

Closure of a Set of Functional Dependencies • Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. • E.g. If A  B and B  C, then we can infer that A  C • The set of all functional dependencies logically implied by F is the closure of F. • We denote the closure of F by F+. • We can find all ofF+by applying Armstrong’s Axioms: • if   , then   (reflexivity) • if  , then    (augmentation) • if  , and   , then   (transitivity) • These rules are • sound (generate only functional dependencies that actually hold) and • complete (generate all functional dependencies that hold).

Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • some members of F+ • A H • by transitivity from A B and B H • AG I • by augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • from CG H and CG I : “union rule” can be inferred from • definition of functional dependencies, or • Augmentation of CG I to infer CG  CGI, augmentation ofCG H to inferCGI HI, and then transitivity

Functional Dependencies (Cont.) • K is a superkey for relation schema R if and only if K R • K is a candidate key for R if and only if • K R, and • for no   K,  R • Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: bor_loan= (customer_id, loan_number, amount ). We expect this functional dependency to hold: loan_numberamount but would not expect the following to hold: amount customer_name

1NF 1NF :- A Relation scheme is said to be in 1NF .if there are no composite attributes, and every attribute is having atomic or indivisible values. 2NF :- A Relation is said to be in 2NF; • 1. IF it is in 1NF and • 2.Non –key attribute are functionally-dependent on the key – attributes. further , if the key has more than one attribute, then no non key attributes should be functionally dependent upon a part of the key attributes.

Design Goals & Pitfalls with FDs • The basic objective of normalization is to reduce the various anomalies in the database. • Normalization can be looked upon as a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of ; • Minimizing redundancy • Minimizing the insertion, deletion, and update anomalies. • How to identify FDs and prime / nonprime attributes? • An Attribute A in a relation is called a prime attribute if A is a part of any candidate key of the relation. • There are some pitfalls in FDs identifying !

Decomposition • The bad design suggests that we should decompose that table. • It is generally not sufficient to check separately that each relation schema in the database is, say, in 2NF or 3NF. • Rather, the process of normalization through decomposition must also confirm the existence of additional properties that the relation schemas, taken together should possess; • The Lossless join, • The dependency preservation property, which ensures that each functional dependency is represented in some individual relations resulting after decomposition.

How to identify Normal Forms • Unnormalized– There are multivalued attributes or repeating groups, No Key attributes • 1NF • 2NF • 3NF • BCNF Concept of FD’s ( Functional Dependency) required • What are the Key Attributes ? • What are the Non-Key Attributes ? • Find out Functional Dependencies ?

Second Normal Form 2NF :- A Relation is said to be in 2NF; • 1. IF it is in 1NF and • 2.Non –key attribute are functionally-dependent on the key – attributes. further ,if the key has more than one attribute ,then no non key attributes should be functionally dependent upon a part of the key attributes.

2 NF Cont… Key attribute B C A D Non-key attribute E Entity F G Partial Dependency is not allowed Why ? What is the effect of allowing partial dependency ? Is Second Normal Form capable to give good Database implementation ?

First Normal Form • A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria. • These criteria are basically concerned with ensuring that the table is a faithful representation of a relationand that it is free of repeating groups. • Some definitions of 1NF, most notably that of Edgar F. Codd, make reference to the concept of atomicity. • Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS." • First Normal Form Scheme should be such that in a given table there should be • No Composite Attributes • No Repeating groups and • All the attributes can be uniqueky identified by Key Attributes

Normal Forms & FDs: Review • Unnormalized– There are multivalued attributes or repeating groups • 1 NF – No composite attributes or repeating groups, • 2 NF – 1 NF plus no partial dependencies • 3 NF – 2 NF plus no transitive dependencies

Normal Form Test for TEACHES • Let us consider Teaches Relation for testing whether it fulfills any Normal Form ? • What are the FDs in this scheme ? • TEACHES contains attributes Professor, Course, Room, Room_Cap, Enrol_Lmt(Enrolment Limit). • The relation scheme for the relation TEACHES is (Prof, Course, Room, Room_Cap,Enrol_lmt) • The domain the attribute Prof is all the faulty members of the university. • The domain of the attribute Course is the courses offered by the university. • The domain of Room is the rooms in the buildings of the university. • The domain of Room_Cap is an integer value indicating the seating capacity of the room.

Design - Analysis • Go for Normal Form Tests ? The Teaches relation is in First Normal Form or not ? • Identify the FDs in Scheme. • Should FDs be identified only by observing the tuples/data ? • Analyse the FDs and see how will it work ? • Go for higher Normal Form Tests ? • What is the remedy of the problem ?

Teaches Relation First Normal Form What are the Key Attributes ? What are the Non-Key Attributes ? Find out Functional Dependencies ? Are Non –key attribute functionally-dependent on the key – attributes.?

FDs in Scheme • The domain of the Enrlo_Lmt is also integer value and should be less than or equal to the corresponding value for Room_Cap. • The TEACHES relation is in first normal form since it’s attributes contain only atomic values and there are key attributes uniquely identifying record. • FDs ? • The course is scheduled in a given room and each Course uniquely identifies Room Course  Room • since the room has the given maximum number of available seats, there is a functional dependency Room Room_CapAnd hence from transitivity Course  Room  Room_Cap Thus the functional dependencies in this relation are • {Course  (Prof, Room, Room_Cap, Enrol_Lmt), • Room Room_Cap}

Relational Database Design