DATABASES AND INFORMATION MANAGEMENT

1. DATABASES ANDINFORMATION MANAGEMENT

2. Organising Data in a Traditional File Environment File organization concepts Database: Group of related files File: Group of records of same type Record: Group of related fields Field: Group of characters as word(s) or number Describes an entity (person, place, thing on which we store information) Attribute: Each characteristic, or quality, describing entity E.g., Attributes Date or Grade belong to entity COURSE

3. The data hierarchy A computer system organizes data in a hierarchy that starts with the bit, which represents either a 0 or a 1. Bits can be grouped to form a byte to represent one character, number, or symbol. Bytes can be grouped to form a field, and related fields can be grouped to form a record. Related records can be collected to form a file, and related files can be organized into a database.

4. The data hierarchy

5. Problems with the traditional file environment Data redundancy: Presence of duplicate data in multiple files Data inconsistency: Same attribute has different values Program-data dependence: When changes in program requires changes to data accessed by program Lack of flexibility Poor security Lack of data sharing and availability

6. � 2008 Prentice-Hall, Inc. 6 Pros and Cons of Conventional Files Pros Easy to design because of their single-application focus Excellent performance due to optimized organization for a single application Pros Files generally out perform databases. Now cheaper & more powerful computers & more efficient database technologies Cons Redesign Leads to redundancy.Pros Files generally out perform databases. Now cheaper & more powerful computers & more efficient database technologies Cons Redesign Leads to redundancy.

7. Files and Tables File � the set of all occurrences of a given record structure. Table � the relational database equivalent of a file. Types of conventional files and tables Master files � Records relatively permanent though values may change Transaction files � Records describe business events Document files � Historical data for review without overhead of regenerating document Archival files � Master and transaction records that have been deleted Table lookup files � Relatively static data that can be shared to maintain consistency Audit files � Special records of updates to other files

8. Data organisation in a traditional file environment

9. The Database Approach to Data Management Database Serves many applications by centralizing data and controlling redundant data Database management system (DBMS) Interfaces between applications and physical data files Separates logical and physical views of data Solves problems of traditional file environment Controls redundancy Eliminates inconsistency Uncouples programs and data Enables organization to centrally manage data and data security

10. HUMAN RESOURCES DATABASE WITH MULTIPLE VIEWS

11. The Database Approach to Data Management Relational DBMS Represent data as two-dimensional tables called relations or files Each table contains data on entity and attributes Table: grid of columns and rows Rows (tuples): Records for different entities Fields (columns): Represents attribute for entity Key field: Field used to uniquely identify each record Primary key: Field in table used for key fields Foreign key: Primary key used in second table as look-up field to identify records from original table

12. RELATIONAL DATABASE TABLES

13. RELATIONAL DATABASE TABLES (cont.)

14. Operations of a Relational DBMS Three basic operations used to develop useful sets of data SELECT: Creates subset of data of all records that meet stated criteria JOIN: Combines relational tables to provide user with more information than available in individual tables PROJECT: Creates subset of columns in table, creating tables with only the information specified

15. THE THREE BASIC OPERATIONS OF A RELATIONAL DBMS

16. Database Types Relational databases Organize data in a table Link tables to each other through their primary keys Object-Oriented DBMS (OODBMS) Stores data and procedures as objects Objects can be graphics, multimedia, Java applets Relatively slow compared with relational DBMS for processing large numbers of transactions Hybrid object-relational DBMS: Provide capabilities of both OODBMS and relational DBMS Multidimensional databases Stores data in multiple dimensions Can easily be customized Process data much faster

17. Database Management Systems (DBMS) Application software designed to capture and analyze data Four main operations of a DBMS are: Creating databases and entering data Viewing and sorting data Extracting data Outputting data

18. Capabilities of Database Management Systems Data definition capability: Specifies structure of database content, used to create tables and define characteristics of fields Data dictionary: Automated or manual file storing definitions of data elements and their characteristics Data manipulation language: Used to add, change, delete, retrieve data from database Structured Query Language (SQL) Microsoft Access user tools for generation SQL Many DBMS have report generation capabilities for creating polished reports (Crystal Reports)

19. Data architecture Data architecture � a definition of how: Files and databases are to be developed and used to store data The file and/or database technology to be used The administrative structure set up to manage the data resource Data is stored in some combination of: Conventional files Operational databases � databases that support day-to-day operations and transactions for an information system. Also called transactional databases. Data warehouses � databases that store data extracted from operational databases. To support data mining Personal databases Work group databases

20. Typical DBMS architecture

21. Data warehouse A data warehouse is a large-scale electronic repository of data that contains and organizes in one place all the data related to an organization. Individual databases contain a wealth of information, but each database�s information usually pertains to one topic. Data warehouses consolidate information from disparate sources to present an enterprise-wide view of business operations. Data in the data warehouse is organized by subject. Most databases focus on one specific operational aspect of business operations.

23. Populating data warehouses Source data for data warehouses can come from three places: Internal sources (such as company databases) External sources (suppliers, vendors, and so on) Customers or visitors to the company Web site Companies can use software on their Web sites to capture information about each click that users make as they navigate through the site. This information is referred to as clickstream data.

24. Advantages of data warehouses Maintain data history, even if the source transaction systems do not. Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger. Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data. Present the organization's information consistently. Provide a single common data model for all data of interest regardless of the data's source. Restructure the data so that it makes sense to the business users. Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems. Add value to operational business applications, notably customer relationship management (CRM) systems.

25. Data staging No two source databases are the same. Therefore, although two databases might contain similar information (such as customer names and addresses), the format of the data is most likely different in each database. Therefore, source data must be �staged� before entering the data warehouse.

26. Data staging Data staging consists of three steps: 1. Extraction of the data from source databases 2. Transformation (reformatting) of the data 3. Storage of the data in the warehouse Many different software programs and procedures may have to be created to extract the data from varied sources and to reformat it for storage in the data warehouse.

27. Data Marts Looking for the data you need in a data warehouse can be daunting when there are terabytes of data. Therefore, small slices of the data warehouse, called data marts, are often created. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department.

28. Data Warehouse

29. Business Intelligence Tools for consolidating, analyzing, and providing access to vast amounts of data to help users make better business decisions E.g., Harrah�s Entertainment analyzes customers to develop gambling profiles and identify most profitable customers Principle tools include: Software for database query and reporting Online analytical processing (OLAP) Data mining

30. Online analytical processing (OLAP) Supports multidimensional data analysis Viewing data using multiple dimensions Each aspect of information (product, pricing, cost, region, time period) is different dimension E.g., how many washers sold in the East in June compared with other regions? OLAP enables rapid, online answers to ad hoc queries

31. MULTIDIMENSIONAL DATA MODEL

32. Data mining The process by which great amounts of data are analyzed and investigated. The objective is to spot significant patterns or trends within the data that would otherwise not be obvious.

33. Data Mining More discovery driven than OLAP Finds hidden patterns, relationships in large databases and infers rules to predict future behavior E.g., Finding patterns in customer data for one-to-one marketing campaigns or to identify profitable customers. Types of information obtainable from data mining Associations Sequences Classification Clustering Forecasting

34. Predictive analysis Uses data mining techniques, historical data, and assumptions about future conditions to predict outcomes of events E.g., Probability a customer will respond to an offer Text mining Extracts key elements from large unstructured data sets (e.g., stored e-mails)

35. Data mining methods Data mining enables managers to sift through data in a number of ways. Each method produces different information that managers can then base their decisions on. The following are five things managers do to make their data meaningful: Classification Estimation Affinity grouping or association rules Clustering Description and visualisation

36. Classification Before mining, managers define data classes that they think will be helpful in spotting trends. They then apply these class definitions to all unclassified data to prepare it for analysis.

37. Estimation When managers classify data, the record either fits the classification criteria or it doesn�t. Estimation enables managers to assign a value, based on some criterion, to data. For example, assume a bank wants to send out credit card offers to people who are likely to be granted a credit card. The bank may run the customers� data through a program that assigns them a score based on where they live, their household income, and their average bank balance. This provides managers with an estimate of the most likely credit card prospects so that they can include them in the mailing.

38. Affinity grouping or association rules When mining data, managers can also determine which data goes together. In other words, they can apply affinity grouping or association rules to the data. For example, suppose analysis of a sales database indicates that two items are bought together 70 percent of the time. Based on this data, managers might decide that these items should be pictured on the same page in the next mail-order catalog they send out.

39. Clustering Clustering involves organizing data into similar subgroups, or clusters. It is different from classification in that there are no predefined classes. The data-mining software makes the decision about what to group together, and it is up to managers to determine whether the clusters are meaningful. For example, the data-mining software may identify clusters of customers with similar buying patterns. Further analysis of the clusters may reveal that certain socioeconomic groups have similar buying patterns

40. Description and visualisation Often, the purpose of data mining is merely to describe data so managers can visualize it. Sometimes having a clear picture of what is going on with the data helps people to interpret it in new and different ways.

41. Modern data architecture

42. Web mining Discovery and analysis of useful patterns and information from WWW E.g., to understand customer behavior, evaluate effectiveness of Web site, etc. Web content mining Knowledge extracted from content of Web pages Web structure mining E.g., links to and from Web page Web usage mining User interaction data recorded by Web server

43. Databases and the Web Many companies use Web to make some internal databases available to customers or partners Typical configuration includes: Web server Application server/middleware/CGI scripts Database server (hosting DBM) Advantages of using Web for database access: Ease of use of browser software Web interface requires few or no changes to database Inexpensive to add Web interface to system

44. Linking internal databases to the web

45. Managing Data resources Establishing an information policy Firm�s rules, procedures, roles for sharing, managing, standardizing data Data administration: Firm function responsible for specific policies and procedures to manage data Data governance: Policies and processes for managing availability, usability, integrity, and security of enterprise data, especially as it relates to government regulations Database administration: Defining, organizing, implementing, maintaining database; performed by database design and management group

46. Administrators Data administrator � a database specialist responsible for data planning, definition, architecture, and management. Database administrator � a specialist responsible for database technology, database design, construction, security, backup and recovery, and performance tuning. A database administrator will administer one or more databases

47. Managing data resources Ensuring data quality More than 25% of critical data in Fortune 1000 company databases are inaccurate or incomplete Most data quality problems stem from faulty input Before new database in place, need to: Identify and correct faulty data Establish better routines for editing data once database in operation

48. Managing data resources Data quality audit: Structured survey of the accuracy and level of completeness of the data in an information system Survey samples from data files, or Survey end users for perceptions of quality Data cleansing Software to detect and correct data that are incorrect, incomplete, improperly formatted, or redundant Enforces consistency among different sets of data from separate information systems

DATABASES AND INFORMATION MANAGEMENT

DATABASES AND INFORMATION MANAGEMENT

Presentation Transcript

Company Information and Securities Information Databases

FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Information Systems and Databases

Foundations of Business Intelligence: Databases and Information Management

Databases and Information Systems

Foundations of Business Intelligence: Databases and Information Management

Information Management XML and Databases

Document Databases for Information Management

Databases and Information Systems 1

INFORMATION SYSTEMS AND DATABASES

Databases and Information Systems 4

Information systems and databases

Information and Databases

Biological Information and Biological Databases

Information and Databases

Chapter 13 Databases and Information Management

Chapter 13 Databases and Information Management

Company Information and Securities Information Databases

Introduction to Geospatial Information Management and Spatial Databases

Foundations of Business Intelligence: Databases and Information Management