CHAPTER 4 Data Management
CHAPTER OUTLINE • Difficulties of Managing Data • The Database Approach • Database Organization • Data Warehouses and Data Marts
Difficulties of Managing Data Data come from many sources Credit card swipes RFID tags Digital video surveillance Blogs E-mails Radiology scans
Chapter Closing Case Document management!
Checking Account Program Checking Account File Transactions Savings Account Program Savings Account File Transactions Loan Account Program Loan Account File Transactions Pre-database Era: File Processing A File Processing Example
Data Redundancy in File Processing Checking Account File Savings Account File Checking Account Number Checking Account Balance Customer Name Customer Address Customer City Customer State Customer Zip Customer Phone Saving Account Number Saving Account Balance Customer Name Customer Address Customer City Customer State Customer Zip Customer Phone Loan Account File Loan Account Number Loan Account Balance Customer Name Customer Address Customer City Customer State Customer Zip Customer Phone
File Processing Summary • Advantages • Easy to develop and use; processing is fast • Disadvantages • Data Redundancy • Data inconsistency • Data Isolation
Checking Account Program Transactions Data Base Management System Savings Account Program Data Base Transactions Loan Account Program Transactions The Database Approach DBMS: software used to create, store, organize, and retrieve data
Checking Account File Customer File Checking Account Number * Checking Account Balance Customer Number ** Customer Number * Customer Name Customer Address Customer City Customer State Customer Zip Customer Phone Savings Account File Saving Account Number * Saving Account Balance Customer Number ** * Primary Key Loan Account File ** Loan Account Number * Loan Account Balance Customer Number ** Foreign Key -- a non-primary key In one file and is linked to the primary key of another file A Sample Database for the Bank Example
Database Approach Summary • Advantages • Reduces data redundancy • Improves data quality • Makes data more accessible • Disadvantages • Requires a willingness to share • Requires much higher initial investment • Increased security risks
Database Organization (Metadata) • Database – a collection of related tables (files) • Table – a list of data records about an entity • Record (row) – an single instance of the entity • Field (column) – an attribute of the entity • Relationship (one-to-one, one-to-many, & many-to-many)
Database Organization (cont.) • Primary Key – used to uniquely identify a record • e.g. Student ID number, Social Security Number • Combination Primary Key – combines two or more fields • e.g. identifying a student by First Name and Last Name • Foreign Key – used to link a child table to a parent table
Data Modeling and Representation Entity–Relationship Diagram
Normalization of Data Non-normalized data – high data redundancy
Referential Integrity • Enforced between the primary key and the foreign key to maintain consistency between related tables • If a field in a child table is declared a foreign key, its values must match values from the parent table's primary key • Without referential integrity, orphan records could result
Database Administration • Data Dictionary – documentation that defines every piece of information in the database • e.g. data type and format, update authority, validation rules, etc. • Database Administrator (DBA) – responsible for database design, implementation, control and, security
Database Queries • Query languages – allows a user to query a database. • e.g., SQL – an industry standard query language. • Query-By-Example (QBE) –a graphical approach to query languages.
Construct a new table by combining two existing tables based on columns having a common definition (common fields) The Inner- Join operation = Common Fields Query More Than One Table at a Time X • A row in the new table is formed by joining together two rows from the input tables. • The two rows from the input tables must have matching values under the common fields for them to be joined.
Table 2 -- Order O# AMT C-Num 101 100 C1 102 50 C3 103 300 C3 104 240 C1 105 180 C4 Joined Table (3) -- Cust-Order C# C-ADDR O# AMT The Anatomy of an Inner-join Operation Table 1 -- Customer C# C-ADDR C1 A1 C2 A2 C3 A3 C4 A4 C1 A1 101 100 C1 A1 104 240 C3 A3 102 50 C3 A3 103 300 C4 A4 105 180
Data Warehouses and Data Marts (cont.) • A repository of historical data for decision support • Data are organized by business dimension or subject (multidimensional) • Managers can conduct extensive analysis with data quickly, without interfering with production databases • A data mart is a lower-cost, scaled-down data warehouse