Database Systems

Database Systems

Major Content & Grade • Introduction * • The Relational Model *** • SQL **** • Transaction Management *** • Database Design (E-R) *** • Database Design (Normalization) ***

Part2 normalization • Introduction • Functional Dependencies • Normal Forms • Lossless Decompositions • Additional Design Considerations

Introduction • Normalization (规范化) is another approach to logical design of a relational database. • E-R approach and normalization approach reinforce each other. • Normalization starts with a real-world situation to be modeled and lists the data items that are candidates to become column names in relational tables, together with a list of rules about the relatedness of these data items. • The aim is to represent all these data items as attributes of tables that obey restrictive conditions associated with what we call normal forms (范式). • 1NF --> 2NF --> 3NF --> BCNF --> 4NF --> 5NF

Design of the Bank Database branch = (branch_name, branch_city, assets) customer = (customer_id, customer_name, customer_street, customer_city) loan = (loan_number, amount) account = (account_number, balance) employee = (employee_id. employee_name, telephone_number, start_date) dependent_name = (employee_id, dname) account_branch = (account_number, branch_name) loan_branch = (loan_number, branch_name) borrower = (customer_id, loan_number) depositor = (customer_id, account_number) cust_banker = (customer_id, employee_id, type) works_for = (worker_employee_id, manager_employee_id) payment = (loan_number, payment_number, payment_date, payment_amount) savings_account = (account_number, interest_rate) checking_account = (account_number, overdraft_amount)

Design of the Bank Database • Suppose we combine borrow and loan to get • Schema: bor_loan = (customer_id, loan_number, amount ) • Instance: • Result is possible repetition of information • For borrower is M:N relationship

Design of the Bank Database • Consider combining loan_branch and loan • Schema: loan_amt_br = (loan_number, amount, branch_name) • Instance: • No repetition • For loan_branch is 1:N relationship

Design of the Bank Database • Example (decompose, 分解) : • we cannot reconstruct the original employee relation

Design of the Bank Database • combining loan_branch and loan into • Schema: loan_amt_br = (loan_number, amount, branch_name) • Is a “good” relation schema • combine borrow and loan to get • Schema: bor_loan = (customer_id, loan_number, amount ) • Is NOT a “good” relation schema • Decide whether a particular relation R is in “good” or NOT? • Suppose we had started with bor_loan. How would we know to split up (decompose,分解) it into borrower and loan? • Normalization theory is the tools used to solve those questions.

A Running Example • Employee Information: From one up to a large number of skills useful to the company

A Running Example • Employee Information:

Anomalies of a Bad Database Design • Update Anomaly (修改异常) • A table T is subject to an update anomaly when changing a single attribute value for an entity instance or relationship instance represented in the table may require that several rows of T be updated.

Anomalies of a Bad Database Design • Delete Anomaly • A table T is subject to a delete anomaly when deleting some row of the table to reflect the disappearance of some instance of an entity or relationship can cause us to lose information about some instance of a different entity or relationship that we do not wish to forget.

Anomalies of a Bad Database Design • Insert Anomaly • We cannot represent information about some entity or instance without including information about some other instance of an entity or relationship that does not exist.

Anomalies of a Bad Database Design • Redundant Data （数据冗余） • An entity instance or relationship instance represented in a table T may account for several rows of T.

Anomalies of a Bad Database Design • Normalize the relation… decompose

Functional Dependencies • Functional Dependencies (函数依赖) The functional dependency   holds onR if and only if for any legal relations r(R), whenever any two tuples t1and t2 of r agree on the attributes  (在上取值相同), they also agree on the attributes . That is, t1[] = t2 []  t1[ ] = t2 [ ] • 另一定义：一但的值确定了，就一定能唯一确定的值(即使不知道的确切值，但可以明确只能取一个值)

emp_id → emp_name emp_id → emp_phone? emp_id → dep_name ?, and emp_phone emp_id Functional Dependencies • In the emp_info table, we get

T1: A → B B A T3: A → B B A Functional Dependencies • Analyze the following tables (suppose they are valid) T2: A → B B → A

Logical implications among functional dependencies • Inclusion Rule (包含规则) • Given a table T with a specified heading Head(T). If X and Y are sets of attributes contained in Head(T), and Y  X, then X→Y. • Proof. By def, need only demonstrate that if two rows u and v agree on X they must agree on Y. But Y is a subset of X, so seems obvious. • Trivial Dependency (平凡的函数依赖) • A Trivial Dependency is a FD of the form X →Y, in a table T where X ∪ Y  Head(T). That will hold for any possible content of the table T. • (更确切地说明什么是trivial dependency) • Given a trivial dependency X→Y in T, it must be the case that Y  X. e.g. A →A, AB →A

Armstrong’s Axioms • Armstrong’s Axiom (阿姆斯特朗公理 1974) • A1: Inclusion rule(包含规则): if Y  X, then X→Y 证明：对于关系模式R<U,F>的任一关系r中的任意两个元组t和s，若t[X]=s[X]，由于Y X，则t[Y]=s[Y]，故X→Y。 • Example: • customer_name, loan_number customer_name • customer_name customer_name

Armstrong’s Axioms • Armstrong’s Axiom (阿姆斯特朗公理 1974) • A2: Transitivity rule(传递规则): if X → Y and Y → Z , then X → Z 证明：对于关系模式R<U，F>的任一关系 r中的任意两个元组 t和s，若t[X]=s[X]，由于X→Y，有 t[Y]=s[Y]；再由Y→Z，有t[Z]=s[Z]，所以F ╞X→Z。 • Example: For relation: S( sno, sname, sdept, dept_manager ) • sno  sdept,sdeptdept_manager • THEN: sno  dept_manager

Armstrong’s Axioms • Armstrong’s Axiom (阿姆斯特朗公理 1974) • A3: Augmentation rule(增广规则): if X → Y, then XZ → YZ 证明：对于关系模式R<U,F>的任一关系r中的任意两个元组t和s，若t[XZ]=s[XZ]；则有t[X]=s[X]和t[Z]=s[Z]；由X→Y，则有t[Y]=s[Y]；所以t[YZ]=s[YZ]，故F╞XZ→YZ。 • Example: For relation: S( sno, sname, sdept, dept_manager ) • sno  sdept • THEN: (sno , sname) dept_manager , sname

Armstrong’s Axioms • Some implications of Armstrong’s Axiom • [1] Union rule(合并规则): if X → Y and X →Z, then X → YZ 证明： (1) X→Y 已知(P规则) (2) X→XY A2，(1) (3) X→Z 已知 (4) XY→YZ A2，(3) (5) X→YZ A3，(2)，(4)  {X→Y，X→Z } ╞X→YZ • Example: S( sno, sname, sdept, dept_manager ) • snosname ,sno  sdept • THEN: sno  sname ,sdept

Armstrong’s Axioms • Some implications of Armstrong’s Axiom • [2] Decomposition rule(分解规则): if X → YZ, then X → Y and X → Z • Example: S( sno, sname, sdept, dept_manager ) • sno  sname ,sdept • THEN: snosname ,sno  sdept • [3] Pseudotransitivity rule(伪传递规则): if X → Y and WY →Z, then XW → Z • [4] Set accumulation rule(集合累积规则): if X →YZ and Z →W, then X→ YZW (以上证明方法同上，略)

Closure (闭包) • The set of all functional dependencies logically implied by F is the closure of F, denoted by F +. • We can find all ofF+by applying Armstrong’s Axioms: • if   , then   (reflexivity) • if  , then    (augmentation) • if  , and   , then   (transitivity) • Armstrong’s Axiom are often referred to as being valid(sound, 有效性) and complete(完备性).

Closure • Given R<U, F>, U＝{A, B, C}, F={A→B, B→C}, • The closure of F : F＋= { Ф→Ф, A→Ф, A→A, …, AB→A, … //A1 A→B,A→AB,AB→B,…,ABC→BC,… //A2 B→C, AB→AC, …… //A2 A →C } //A3 • note：there are 43non-duplicate FDs. • The closure of functional dependency sets includes all dependencies among attributes of a relation. • drawback：its too hard to be managed.

Closure • Algorithm To compute the closure of a set of functional dependencies F: begin F+ = Frepeat for each functional dependency f in F+ apply inclusion and augmentation rules on fadd the resulting functional dependencies to F+for each pair of functional dependencies f1 and f2 in F+ if f1 and f2 can be combined using transitivity then add the resulting functional dependency to F+until F+ does not change any further End

Closure • Given a set of attributes a, define the closureof aunderF (denoted by aF+) as the set of attributes that are functionally determined by a under F. • Algorithm to compute aF+, the closure of a under F . • Closure of attributes (属性集的闭包) result := a; while (changes to result) do for each in F do begin if  result then result := result end

Closure • Example1:Given R<U, F>, R = (A, B, C, G, H, I) F = {A  B, A  C, CG  H, CG  I, B  H} • (AG)+ = ? 1) result = AG 2) result = AGBC (A B and A  C) 3) result = AGBCH (CG H and CG  AGBC) 4) result = AGBCHI (CG I and CG  AGBCH) • Example2:Given R<U, F>, R = (A, B, C, D, E)F={B→CD, AD→E, B→A} • (BC)+ = ? • Closure of attributes (属性集的闭包)

Closure • There are several uses of the attribute closure algorithm: 1) Testing for superkey: • To test if  is a superkey, we compute +, and check if + contains all attributes of R. • Example: for relation R<U, F>, U = {A, B, C, D, E},F = {AB→C, B→D, C→E, EC→B, AC→B }IS AB a superkey or not? • (AB)F+＝{ABCDE} = U • So AB is a superkey • Closure of attributes (属性集的闭包)

Closure • There are several uses of the attribute closure algorithm: • Closure of attributes (属性集的闭包) 2) Testing functional dependencies • To check if a functional dependency    holds (or, in other words, is in F+), just check if   +. • Example: for relation R<U, F>, U = {A, B, C, D, E},F = {AB→C, B→D, C→E, EC→B, AC→B } • IS BE→CD implied by F? For (BE)F+＝{BED} , not include CD, so not implied. • IS AB→E implied by F? • (Theorem) :

Closure • There are several uses of the attribute closure algorithm: • Closure of attributes (属性集的闭包) 3) Computing closure of F • For each   R, we find the closure +, and for each S  +, we output a functional dependency   S. • (Theorem) :

Cover • FD Set Cover(函数依赖集的覆盖): • A set F of FDs on a table T is said to cover another set G of FDs on T, if the set G of FDs can be derived by implication rules from the set F, or in other words. If GF+. • If F covers G and G covers F, then the two sets of FDs are said to be equivalent, and we write F≡ G. • If two FDs are equivalent, the have the same implication of FDs. • Example: Consider the two sets of FDs on relaton R(ABCDE) : F={B→CD, AD→E, B→A} and G={B→CDE, B→ABC, AD→E} Is F≡ G or NOT?

Database Systems

Cover • Sets of functional dependencies may have redundant dependencies that can be inferred from the others. • For example: A  C is redundant in: {AB, BC} • Parts of a functional dependency may be redundant • E.g.: on RHS: {AB, BC, ACD} can be simplified to {A B, BC, AD} • E.g.: on LHS: {A B, BC, ACD} can be simplified to {A B, BC, AD} • we need a cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies.

Minamal Cover • Minimal Cover(最小覆盖，最小函数依赖集) • Step 1. Decomposition Right Hand Side of FDs Create an equivalent set H of FDs, with only single attributes on the right side.(等价变换) • Step 2. Erase extraneous attributes on LHS For  in FAttribute A is extraneous in  if A   and F logically implies (F– {})  {(–A) }. Then replace   with (–A)  • Step 3. Delete redundant FD For  in F, if(F– {}) logically implies , then delete  from F.

Minamal Cover • Example: for relation R<U, F>, U = {A, B, C, D, E}, F={A→BC, BCD→E, B→D, A→D, E→A}compute the minimal cover of F. 1) 右部化为单一属性 F1={A→B, A→C, BCD→E, B→D, A→D, E→A} 2) 去掉左部冗余属性 for (BC)F+=BCDEA, include E, so D in LHS of BCD→Eis extraneous. F2＝{A→B, A→C, BC→E, B→D, A→D, E→A} 3) 去掉多余函数依赖 for A→D because of (A)+ F2-(A→D )=ABCED, is redundancy Fmin = {A→B, A→C, BC→E, B→D, E→A}

Canonical Cover(正则覆盖) • A canonical coverfor F is a set of dependencies Fc such that • F logically implies all dependencies in Fc, and • Fclogically implies all dependencies in F, and • No functional dependency in Fccontains an extraneous attribute, and • Each left side of functional dependency in Fcis unique. • gather all FDs in Minimal Cover with equal left-hand sides and use the union rule to create an equivalent set of FDs where all left-hand sides are unique, we’ll get canonical cover .

Canonical Cover(正则覆盖) • Example: for relation R<U, F>, U = {A, B, C}, F={A BC, B C, A B, ABC}compute the canonical cover of F. 1) 右部分为单属性 F1 = {A B, A C, B C, A B, ABC} 2) 去掉左部冗余属性对ABC，(A)F1+= ABC, 包含C，故用B  C替换之。 F1 = {A B, A C, B C} 3) 去掉多余函数依赖对AC， (A)F-(AC)+= ABC, 多余，去掉。 4) 合并函数依赖 Fc = {A B, B C}

作业 • R = (A, B, C, D, E, F) F = {A BC, E CF, B E, CD EF} • (AB)+ = ? • (AD) + = ? Is AD F implied by F? • Page 307 • 7.6 • 7.7

KEY • K is a superkey for relation schema R if and only if K R • K is a candidate key for R if and only if • K R, and • for no   K,  R • Prime attribute: an attribute that appeared in some candidate key • non-prime attribute: an attribute that DO NOT appeared in any candidate key

5.3 数据依赖的公理系统 • 候选码求解算法 • 分析：对于给定的关系模式R<U, F>，依照函数依赖集F将U中的属性分为以下四类： L类属性:在F中只出现在函数依赖的左部的属性； R类属性:在F中只出现在函数依赖的右部的属性； LR类属性:分别出现在F中的函数依赖左部和右部的属性； N类属性:不在F中的函数依赖中出现的属性。 • 结论： ① L类属性和N类属性必包含于任何候选码中； ②R类属性必不包含于任何候选码中； ③LR类属性不能确定是否在候选码中。

算法： 　　对于给定的关系模式R<U, F>，其中U为属性集合，F为函数依赖集。 (1) 依照函数依赖集F将R中的所有属性分为L类、R类、LR类和N类属性，令X为L、N类属性的集合，Y为LR类属性集合； 5.3 数据依赖的公理系统例：设R<U, F>，其中： U={A,B,C,D,E}，F={A→BC, CD→E, B→D, E→A}，求R的所有候选码。解: (1) R中无L、N类属性，A, B,C,D,E均为LR类属性，故X= Ф，Y＝{A,B,C,D,E}; (2) XF＋= Ф ≠ U； (2) 若XF＋=U，则X为R的唯一候选码 (?)，结束；否则，转(3)；

算法： 　　对于给定的关系模式R<U, F>，其中U为属性集合，F为函数依赖集。 (1) …… (2) …… 5.3 数据依赖的公理系统例：设R<U, F>，其中： U={A,B,C,D,E}，F={A→BC, CD→E, B→D, E→A}，求R的所有候选码。解: (1) …… X= Ф，Y＝{A,B,C,D,E}; (2) …… (3)取A，则AF+=ABCDE=U，A为候选码；取B、C、D，其闭包均不等于全属性集U；　　取E，则EF+=ABCDE=U，E为候选码； Y= {B,C,D} (3)逐一取Y中的单一属性A，若(XA)F+=U，则XA为候选码，令Y＝Y－{A} ，转(4)；

算法： 　　对于给定的关系模式R<U, F>，其中U为属性集合，F为函数依赖集。 (1) …… (2) …… (3) …… 5.3 数据依赖的公理系统例：设R<U, F>，其中： U={A,B,C,D,E}，F={A→BC, CD→E, B→D, E→A}，求R的所有候选码。解:(3) ……，A为候选码； ……，E为候选码； Y= {B,C,D} (4)在Y中任取两个属性判定 (BC)F+=BCDEA=U，BC为候选码； (BD)F+不等于全属性集U，BC不是候选码； (CD)F+=CDEAB=U，CD为候选码； (4) 依次取Y中的任意两个、三个……属性组成属性组XZ，若XZ不包含已求得的候选码，求其关于F的闭包(XZ)F+，若(XZ)F+ =U，则XZ为候选码。直到取完Y中的所有属性为止，算法结束。

5.3 数据依赖的公理系统 例：设R<U, F>，其中： U={A,B,C,D,E}，F={A→BC, CD→E, B→D, E→A}，求R的所有候选码。解:(3) ……，A为候选码； ……，E为候选码； Y= {B,C,D} (4)在Y中任取两个属性判定 ……， BC为候选码； ……， CD为候选码； (5) BCD包含已求得的候选码BC，BCD不是码，结束。故关系R的候选码为：A，E，BC，CD。

Normal Forms -- 1NF • A relational schema R is in first normal form if the domains of all attributes of R are atomic. • NO composite attributes, such as:customer( customer-id, name(first-name, middle-initial, last-name), date-of-birth ) • Each attribute as an unit, even they have several part that have individual information. • Example: Strings would normally be considered indivisible. For student number ‘130711***’, ’13’ is department number, but you cannot use. For doing so is a bad idea: leads to encoding of information in application program rather than in the database.

Normal Forms -- 1NF • A schema R not in 1NF, then it’s NOT a relational schema. • A relation R is in 1NF is not ‘good’ enough. • For relation:Employee( emp_id, emp_name, emp_phone, dept_name, dept_phone, dept_mgrname, skill_id, skill_name, skill_date, skill_lvl ) • Is in 1NF • Has Insert Anomaly, Delete Anomaly, Update Anomaly and DataRedundancy .

Normal Forms -- 2NF • Second normal form (2NF):A relation schema R with FD set F is said to be in 2NF, if for any functional dependency X→A implied by F that lies in R, where A is a single attribute that is not in X and is non-prime(非主属性,不在任何候选码中出现的属性), X is not a proper subset(真子集) of any key K of R. • Or there are NO non-prime attributes dependent on Candidate Key partially in 2NF.(不存在非主属性对码的部分依赖) • Example：R(A, B,C,D), F = {AB C, AC  BD} • Candidate Key : AB, AC • AB D, ACD is FULLdependency • R∈2NF

Database Systems