1 / 42

DATA RESOURCE MANAGEMENT

DATA RESOURCE MANAGEMENT. Data Hierarchy in a Computer System. Entitities and Attributes. Problems with the Traditional File Environment. Data redundancy Program-Data dependence Lack of flexibility Poor security Lack of data-sharing and availability. Figure 7-3.

jacob
Télécharger la présentation

DATA RESOURCE MANAGEMENT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATA RESOURCE MANAGEMENT

  2. Data Hierarchy in a Computer System

  3. Entitities and Attributes

  4. Problems with the Traditional File Environment • Data redundancy • Program-Data dependence • Lack of flexibility • Poor security • Lack of data-sharing and availability

  5. Figure 7-3 Traditional File Processing

  6. Database Management System (DBMS) • Creates and maintains databases • Eliminates requirement for data definition statements • Acts as interface between application programs and physical data files • Separates logical and physical views of data

  7. The Contemporary Database Environment

  8. Components of DBMS • Data definition language: Specifies content and structure of database and defines each data element • Data manipulation language: • Manipulates data in a database • Data dictionary:Stores definitions of data elements, and data characteristics

  9. Sample Data Dictionary Report

  10. Figure 7-6 Relational Data Model

  11. Three Basic Operations in a Relational Database • Select:Creates subset of rows that meet specific criteria • Join:Combines relational tables to provide users with information • Project:Enables users to create new tables containing only relevant information

  12. Figure 7-7 Three Basic Operations in a Relational Database

  13. FLAT FILE – NOT NORMALIZED

  14. A Normalized Relation of ORDER

  15. Ensuring Database Integrity • Database integrity involves the maintenance of the logical and business rules of the database. • There are two kinds of “DB Integrity” that must be addressed: • Entity Integrity • Referential Integrity

  16. Entity Integrity • Entity integrity deals with within-entity rules. • These rules deal with ranges and the permission of null values in attributes or possibly between records

  17. Examples of Entity Integrity • Data Type Integrity: very common and most basic. Checks only for “data type” compatibility with DB Schema, such as: numeric, character, logical, date format, etc. • Commonly referred to in GIS manuals as: • Range and List domains • Ranges - acceptable Numeric ranges for input • List - acceptable text entries or drop-down lists.

  18. Enforcing Integrity • Not a trivial task! • Not all database management systems or GIS software enable users to “enforce data integrity” during attribute entry or edit sessions. • Therefore, the programmer or the Database Administrator must enforce and/or check for “Integrity.”

  19. Referential Integrity • Referential integrity concerns two or more tables that are related. • Example: IF table A contains a foreign key that matches the primary key of table B THEN values of this foreign key either match the value of the primary key for a row in table B or must be null. • Necessary to avoid: Update anomaly, Delete anomaly.

  20. Querying Databases: Elements of SQL Basic SQL Commands • SELECT: Specifies columns • FROM: Identifies tables or views • WHERE: Specifies conditions

  21. Using SQL- Structured Query Language • SQL is a standard database protocol, adopted by most ‘relational’ databases • Provides syntax for data: • Definition • Retrieval • Functions (COUNT, SUM, MIN, MAX, etc) • Updates and Deletes

  22. SQL Examples • CREATE TABLE SALESREP • Item definition expression(s) • {item, type, (width)} • DELETE table • WHERE expression

  23. Data Retrieval • SELECT list FROM table WHERE condition • list - a list of items or * for all items • WHERE - a logical expression limiting the number of records selected • can be combined with Boolean logic: AND, OR, NOT • ORDER may be used to format results

  24. UPDATE tables • SET item = expression • WHERE expression • INSERT INTO table • VALUES …..

  25. Database Normalization • Normalization: The process of structuring data to minimize duplication and inconsistencies. • The process usually involves breaking down a single Table into two or more tables and defining relationships between those tables. • Normalization is usually done in stages, with each stage applying more rigorous rules to the types of information which can be stored in a table.

  26. Normalization • Normalization: a process for analyzing the design of a relational database • Database Design - Arrangement of attributes into entities • It permits the identification of potential problems in your database design • Concepts related to Normalization: • KEYS and FUNCTIONAL DEPENDENCE

  27. Ex: Database Normalization (1) • Sample Student Activities DB Table • Poorly Designed • Non-unique records • John Smith • Test the Design by developing sample reports and queries

  28. Ex: Database Normalization (2) • Created a unique “ID” for each Record in the Activities Table • Required the creation of an “ID” look-up table for reporting (Students Table) • Converted the “Flat-File into a Relational Database

  29. Ex: Database Normalization (3) • Wasted Space • Redundant data entry • What about taking a 3rd Activity? • Query Difficulties - trying to find all swimmers • Data Inconsistencies - conflicting prices

  30. Ex: Database Normalization (4) • Students table is fine • Elimination of two columns and an Activities Table restructuring, Simplifies the Table • BUT, we still have Redundant data (activity fees) and data insertion anomalies. Problem: If student #219 transfers we lose all references to Golf and its price.

  31. Ex: Database Normalization (5) • Modify the Design to ensure that “every non-key field is dependent on the whole key” • Creation of the Participants Table, corrects our problems and forms a union between 2 tables. This is a Better Design!

  32. The Normal Forms • A series of logical steps to take to normalize data tables • First Normal Form • Second • Third • Boyce Codd • There’s more, but beyond scope of this

  33. First Normal Form (1NF) • All columns (fields) must be atomic • Means : no repeating items in columns Solution: make a separate table for each set of attributes with a primary key (parser, append query) Customers CustomerID Name Orders OrderID Item CustomerID OrderDate

  34. Second Normal Form (2NF) • In 1NF and every non-key column is fully dependent on the (entire) primary key • Means : Do(es) the key field(s) imply the rest of the fields? Do we need to know both OrderID and Item to know the Customer and Date? Clue: repeating fields Solution: Remove to a separate table (Make Table) Orders OrderID CustomerID OrderDate OrderDetails OrderID Item

  35. Third Normal Form (3NF) • In 2NF and every non-key column is mutually independent • means : Calculations • Solution: Put calculations in queries and forms OrderDetails OrderID Item Quantity Price Put expression in text control or in query: =Quantity * Price

  36. Data Warehousing and Datamining Data warehouse • Supports reporting and query tools • Stores current and historical data • Consolidates data for management analysis and decision making

  37. What is a Data Warehouse? "A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". Bill Inmon (1990) "A Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated.…” Anonymous

  38. Components of a Data Warehouse

  39. Data Mining • ON-LINE ANALYTICAL PROCESSING (OLAP):ability to manipulate, analyze large volumes of data from multiple perspectives • MINING: Seeking relationships that are not known in advance. A function of the software and data organization.

  40. DW Characteristics • Subject Oriented:Data that gives information about a particular subject instead of about a company's ongoing operations. • Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. • Time Variant: All data in the data warehouse is identified with a particular time period.

  41. Data Acquisition • The process of moving company data from the source systems into the warehouse. • Often the most time-consuming and costly effort. • Performed with software products known as ETL (Extract/Transform/Load) tools. • Over 50 ETL tools on market.

  42. Data Cleansing • Typically performed in conjunction with data acquisition. • A complicated process that validates and, if necessary, corrects the data before it is inserted. • AKA "data scrubbing" or "data quality assurance".

More Related