1 / 76

Week 11 November 7

Week 11 November 7. Data Normalization and ERD Conceptual, Logical and Physical Database Design. Data Normalization. The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise. Achieve a design that is highly flexible

issac
Télécharger la présentation

Week 11 November 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 11November 7 Data Normalization and ERD Conceptual, Logical and Physical Database Design

  2. Data Normalization • The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise. • Achieve a design that is highly flexible • Reduce redundancy • Ensure that the design is free of certain update, insertion and deletion anomaliesCatherine Richardo, 1990

  3. Normalization 1NF Flat file 2NF Partial dependencies removed 3NF Transitive dependencies removed BCNF Every determinant is a candidate key Non-tivial multi-valued dependencies removed 4NF

  4. 0000 000 0000 0 John Smith 10001 Order No. Stereos To Go Invoice 6 15 99 Date: / / Go, Hogs 0000-000-0000-0 Account No. John Smith Customer: 2036-26 Street 1/05 Address: Sacramento CA 95819 City State Zip Code 6 18 99 Date Shipped: / / Product Item Product Description/Manufacturer Price Qty Code Number SAGX730 Pioneer Remote A/V Receiver 1 1 1 1 56995 35995 39995 2 AT10 Cervwin Vega Loudspeakers CDPC725 Sony Disc-Jockey CD Changer 3 4 5 132985 10000 10306 153291 Subtotal Shipping & Handling Sales Tax Total

  5. Unnormalized Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item1 Item1_descrip Item1_qty Item1_price, Item2 Item2_descrip Item2_qty Item2_price, . . . , Item7 Item7_descrip Item7_qty Item7_price) How would a program process the data to recreate the invoice?

  6. Unnormalized to 1NF (Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item1, Item1_descrip, Item1_qty, Item1_price, Item2, Item2_descrip, Item2_qty, Item2_price, . . . , Item7, Item7_descrip, Item7_qty, Item7_price) Repeating groups A flat file places all the data of a transaction into a single record. This is reminiscent of a COBOL or BASIC program processing a single transaction with one read statement.

  7. Unnormalized to 1NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item, Item_descrip, Item_qty, Item_price) Nominated group of attributes to serve as the key (form a unique combination) • Eliminate the repeating groups. • Each row retains data for one item. • If a person bought 5 items, we would have five tuples

  8. 10001 123456 John Smith ••• SAGX730 Pioneer Remote A/V Rec 1 569.95 10001 123456 John Smith ••• AT10 Cerwin Vega Loudspeakers 1 359.95 10001 123456 John Smith ••• CDPC725 Sony Disc Jockey CD 1 399.95 10001 123456 John Smith ••• S/H Shipping 1 100.00 10001 123456 John Smith ••• Tax Sales Tax 1 103.06 1NF Flat File Invoice number Account number Customer name Item Quantity Item Price Item Description

  9. From 1NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item, Item_descrip, Item_qty, Item_price) Functional dependencies and determinants Example: item_descrip is functionally dependent on item, such that item is the determinant of item_descript.

  10. From 1NF to 2NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Item, Item_descrip, Item_qty, Item_price) Is this unique by itself? What happens if the item is purchased more than once?

  11. From 1NF to 2NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) Partial dependency (Invoice_number,Item, Item_descrip, Item_qty, Item_price) Composite key (forms a unique combination)

  12. From 1NF to 2NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Invoice_number,Item, Item_qty, Item_price) (Item, Item_descrip)

  13. From 2NF to 3NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Invoice_number,Item, Item_qty, Item_price) (Item, Item_descrip) Which attributes are dependent on others? Is there a problem?

  14. Transitive Dependencies and Anomalies • Insertion anomalies • To add a new row, all customer (name, address, city, state, zip code, phone) and products (description) must be consistent with previous entries • Deletion anomalies • By deleting a row, a customer or product may cease to exist • Modification anomalies • To modify a customer’s or product’s data in one row, all modifications must be carried out to all others

  15. Insertion and Modification AnomaliesFor example… Insert a new Panasonic product Product_code Manufacturer_name DVD-A110 Panasonic PV-4210 Panasonic PV-4250 Panasonic CT-32S35 PAN Inconsistency DVD-A110 Panasonic PV-4210 PanaSonic PV-4250 Pana Sonic CT-32S35 PAN Change all Panasonic products’ manufacturer name to “Panasonic USA”

  16. Deletion AnomalyFor Example… 4377182 John Smith  Sacramento CA 95831 4398711 Arnold S  Davis CA 95691 4578461 Gray Davis  Sacramento CA 95831 4873179 Lisa Carr  Reno NV 89557 By deleting customer Arnold S, we would also be deleting Davis, California.

  17. Invoice_number Invoice_date Date_delivered Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code Item Item_descrip Invoice_number+Item Item_qty Item_price Transitive Dependencies • A condition where A, B, C are attributes of a relation such that if A  B and B  C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).

  18. Why Should City and State Be Separated from Customer Relation? • City and state are dependent on zip code for their values and not the customer’s identifier (i.e., key). Zip_code  City, State • Otherwise, Cust_account  Cust_addr, Zip_code  City, StateIn which case, you have transitive dependency.

  19. 3NF Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Invoice_items Relation (Invoice_number, Item, Item_qty, Item_price) Items Relation (Item, Item_descrip)

  20. 3NF Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Invoice_items Relation (Invoice_number, Item, Item_qty, Item_price) Items Relation (Item, Item_descrip) Manufacturers Relation (Manuf_code, Manuf_name) Since the Items relation contains the manufacturer’s name in the description, a separate Manufacturers relation can be created

  21. First to Third Normal Form(1NF - 3NF) • 1NF: A relation is in first normal form if and only if every attribute is single-valued for each tuple (remove the repeating or multi-value attributes and create a flat file) • 2NF: A relation is in second normal form if and only if it is in first normal form and the nonkey attributes are fully functionally dependent on the key (remove partial dependencies) • 3NF: A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key (remove transitive dependencies)

  22. Putting It Together ERD of the Normalized Data Model

  23. 3NF Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Invoice_items Relation (Invoice_number, Item, Item_qty, Item_price) Items Relation (Item, Item_descrip, Manuf_code) Manufacturers Relation (Manuf_code, Manuf_name)

  24. ERD Invoices Customers Zip_Codes Cust_account Cust_name Cust_addr Zip_code Zip_code City State Invoice_number Invoice_date Date_delivered Cust_account Invoice_items Manufacturers Items Item Item_descrip Manuf_code Invoice_number Item Item_qty Item_price Manuf_code Manuf_name

  25. ERD Invoices Customers Zip_Codes Order Locate Invoice_number Invoice_date Date_delivered Cust_account Cust_account Cust_name Cust_addr Zip_code Zip_code City State (0..*) (1..1) (0..*) (1..1) (1..1)  Have (1..*) Invoice_items Manufacturers Items Appear on Produce Item Item_descrip Manuf_code Invoice_number Item Item_qty Item_price Manuf_code Manuf_name (0..*) (1..1) (0..*) (1..1)

  26. ERD Invoices Customers Zip_Codes Order Locate Invoice_number Invoice_date Date_delivered Cust_account Cust_account Cust_name Cust_addr Zip_code Zip_code City State (0..*) (1..1) (0..*) (1..1) Partial (1..1) Zip codes locate Customers. () A zip code can be related to a minimum of zero and a maximum of many customers. () A customer can be related to a minimum and maximum of one zip code.  Have (1..*) Invoice_items Manufacturers Items Appear on Produce Item Item_descrip Manuf_code Invoice_number Item Item_qty Item_price Manuf_code Manuf_name (0..*) (1..1) (0..*) (1..1)

  27. ERD Invoices Customers Zip_Codes Order Locate Invoice_number Invoice_date Date_delivered Cust_account Cust_account Cust_name Cust_addr Zip_code Zip_code City State (0..*) (1..1) (0..*) (1..1) Partial (1..1) Customers order (items) on invoices. () A customer can be related to a minimum of zero and a maximum of many invoices. () An invoice can be related to a minimum and maximum of one customer.  Have (1..*) Invoice_items Manufacturers Items Appear on Produce Item Item_descrip Manuf_code Invoice_number Item Item_qty Item_price Manuf_code Manuf_name (0..*) (1..1) (0..*) (1..1)

  28. ERD Invoices Customers Zip_Codes Order Locate Invoice_number Invoice_date Date_delivered Cust_account Cust_account Cust_name Cust_addr Zip_code Zip_code City State (0..*) (1..1) (0..*) (1..1) Invoices possess invoice items () An invoice can be related to a minimum of one and a maximum of many invoice items. ()An invoice item can be related to a minimum and maximum of one invoice. (1..1) Mandatory  Have (1..*) Invoice_items Manufacturers Items Appear on Produce Item Item_descrip Manuf_code Invoice_number Item Item_qty Item_price Manuf_code Manuf_name (0..*) (1..1) (0..*) (1..1)

  29. ERD Invoices Customers Zip_Codes Items are sold on invoice items. () An item can be related to a minimum of zero and a maximum of many invoice items. () An invoice item can be related to a minimum and maximum of one item. Order Locate Invoice_number Invoice_date Date_delivered Cust_account Cust_account Cust_name Cust_addr Zip_code Zip_code City State (0..*) (1..1) (0..*) (1..1) (1..1)  Have Partial (1..*) Invoice_items Manufacturers Items Appear on Produce Item Item_descrip Manuf_code Invoice_number Item Item_qty Item_price Manuf_code Manuf_name (0..*) (1..1) (0..*) (1..1)

  30. ERD Manufacturers produce items. () A manufacturer can be related to a minimum of zero and a maximum of many items. () An item can be related to a minimum and maximum of one manufacturer. Invoices Customers Zip_Codes Order Locate Invoice_number Invoice_date Date_delivered Cust_account Cust_account Cust_name Cust_addr Zip_code Zip_code City State (0..*) (1..1) (0..*) (1..1) (1..1)  Have Partial (1..*) Invoice_items Manufacturers Items Appear on Produce Item Item_descrip Manuf_code Invoice_number Item Item_qty Item_price Manuf_code Manuf_name (0..*) (1..1) (0..*) (1..1)

  31. Higher Forms of Data Normalization Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF) Domain Key Normal Form (DKNF)

  32. Boyce-Codd Normal Form (BCNF) • A relation is in Boyce-Codd normal form if and only if every determinant is a candidate key • For a relation with only one candidate key, 3NF and BCNF are equivalent. • Usually occurs when keys of different relations overlap A determines B Attribute A Attribute B Determinant (B is functionally dependent on A)

  33. BCNF Example User (UserID, Dept, Name, ComputerID, EmpClassification) ComputerID  Dept (a department issues a computer) UserID, Dept  ComputerID, Name, EmpCassification (Employees may have the same name and UserIDs are unique within the department only) UserID, ComputerID  Dept, Name, EmpClassification BCNF UserComputer (ComputerID, Dept) User (UserID, ComputerID, Name, EmpClassification)

  34. From 3NF to BCNF Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Candidate keys? Invoice_items Relation (Invoice_number, Item, Item_qty, Item_price) Items Relation (Item, Item_descrip) Manufacturers Relation (Manuf_code, Manuf_name)

  35. Fourth Normal Form (4NF) • A relation is in fourth normal form if and only if it is in Boyce-Codd normal form and there are no nontrivial dependencies. • Identify all determinants and make sure they are candidate keys

  36. 100 Finance F177-99 100 Marketing F177-99 100 Finance F288-00 102 Finance F288-00 102 Marketing F177-99 102 Finance F177-99 4NF Example Employee (EmployeeID, Dept, Project) Matrix management Multivalued dependencies 4NF Employee (EmployeeID, Dept) Projects (EmployeeID, Project)

  37. 100 Finance F177-99 100 Marketing F177-99 100 Finance F288-00 102 Finance F288-00 102 Marketing F177-99 102 Finance F177-99 4NF Example Matrix management Multivalued dependencies 4NF 100 Finance 100 Marketing 102 Finance 102 Marketing 100 F177-99 100 F288-00 102 F288-00 102 F177-99

  38. Fifth Normal Form (5NF)aka Project-Join NF • A relation is in fifth normal form if no remaining nonloss projections (i.e., all projects preserve all information contained in the original relation)are possible, except the trivial one in which the key appears in each project. • The join of all projects will result in the original relation • No systematic method exists for obtaining 5NF or for ensuring that a set of relations is indeed 5NFRicardo, 1990

  39. Domain-Key Normal Form (DKNF) • A relation is in domain-key normal form if every constraint is a logical consequence of domain constraints or key constraints (i.e., all possible values are a result of an imposed constraint) • There is no proven method of converting a design to DKNF, so it remains an ideal rather than a state that can readily be achievedRicardo, 1990

  40. DKNF For example: Emp_ID, Emp_name, Classification, Position, Salary • Domain for Position • Strategic Planner • CIO • Vice President • Domain for Classification: • Executive • Manager • Staff • Domain for Position • Programmer/Analyst I • Programmer/Analyst II • Database/Analyst I

  41. Database Design Methodology Conceptual database design • Build conceptual representation of the database Logical database design • Translate conceptual representation to logical structure of the database Physical database design • Operatioanlize logical structure in a physical implementation

  42. Conceptual Database Design • The process of constructing a model of the data used in an enterprise, independent of all physical considerations • What’s involved… • Identify entity types, relationship types • Identify and associate attributes with entity or relationship types • Determine attribute domains • Determine candidate, primary and alternate key attributes • Consider use of enhanced modeling concepts • Check model for redundancies • Validate conceptual model against user transactions • Review conceptual data model with the users

  43. Logical Database Design • The process of constructing a model of the data used in an enterprise based on a specific data model, but independent of a particular DBMS and other physical considerations • What’s involved… • Derive relations for logical data model • Validate relations using data normalization • Validate relations against user transactions • Check integrity constraints • Review logical data model (ERD) with the users • Merge logical data models into global data model • Check for future growth

  44. Gather Information • Meet with the users to get gather information • Interviews • Documents

  45. Derive Relations  Invoices have invoice items  One-to-many relationship • Strong and weak entity types • Relationship types (cardinality) • Participation (mandatory vs. partial) Invoice Invoice number (pk) Invoice date Delivery date Sales type Customer account Invoice Items Invoice number (pk) Product code (pk) Manufacture code Quantity Sales Price Have 1..1 1..*  Mandatory (all invoices must have at least one invoice item  Weak entity type (Invoice number is part of key)  Strong entity type

  46. Validate Relations • Normalize relations • Validate against transactions - Can a transaction be recreated given the data retained in the relations?) • Check integrity constraints • Required data (not null) • Domain constraints (in, references) • Multiplicity • Entity integrity (primary key) • Referential integrity (foreign key) • General constraints (business rules)

  47. Review Data Model with the Users • Be pleasant and professional, not arrogant, challenging or condescending • Not everyone is receptive to change • Your role is to facilitate change • The user is always “right” – It’s his/her data • Document all change requests (CYA) • Listen, listen, listen… (Even if you don’t agree)

  48. Logical Global Data Model Invoice Records Transactions Inventory Counts and retail prices Local Data Models Cust Accounts Global Data Model Customer credit accounts Cust Billing Customer credit sales Local data models are merged to create a (near) normalized global data model Vendor History Vendor performance Product Sales Sales history

  49. Physical Database Design • The process of producing a description of the implementation of the database on secondary storage • It describes the base relations, fle organizations and indexes used to achieve efficient access to the data and nay associated integrity constraints and security measures • What’s involved… • Translate logical data model for target DBMS: Design base relations, representation of derived data and general constraints • Design file organizations and indexes: Analyze transactions, choose file organizations, choose indexes, estimate disk space requirements • Design user views and security mechanisms • Consider the introduction of controlled redundancy • Monitor and tune the operational system Dictated by the DB product

More Related