1 / 47

CHAPTER SIX

CHAPTER SIX. Databases and Data Warehouses. Information Granularity. Refers to the level of detail of information Detailed (POS transaction) Course (Global sales totals). Transactional vs. Analytical Information. Transactional information comes from a business process A bank deposit

selene
Télécharger la présentation

CHAPTER SIX

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHAPTER SIX Databases and Data Warehouses

  2. Information Granularity • Refers to the level of detail of information • Detailed (POS transaction) • Course (Global sales totals)

  3. Transactional vs. Analytical Information • Transactional information comes from a business process • A bank deposit • A credit card charge • Analytical information uses transactional data for the purposes of decision making • Account balance trends • Using credit card history to detect fraud

  4. Transactional vs. Analytical Information

  5. Information Dimensions • Information timeliness • Obsolete information is useless • Today’s information needs to be provided in real time or near real time • Information quality • Wrong information is useless • Redundant information can be the cause of errors • Information must be complete • Data inconsistency and data integrity

  6. Database Management • Characteristics • Complex • Databases often spread across multiple servers • Databases often spread across multiple physical disks • Fault tolerance is critical • Databases may be distributed

  7. Database Vendors • The industry has consolidated • IBM • DB2 Universal • Oracle • Microsoft • SQL Server • Access • Sun (MySQL) • Is now Oracle

  8. Database Performance • Transaction Processing Performance Council provides standard benchmarks • TPC-C – Online transaction processing • TPC-E – Online brokerage transactions • TPC-H – Ad-hoc decision support • TPC-W – Web / E-commerce

  9. Database Performance (TPC-C) • Multiple transaction types • Independent of software and hardware • Scalable • Basis is online transaction processing (OLTP)

  10. 1960s Data Management • These are legacy systems • Batch processing • Characterized by traditional file processing • Data processing was sequential • Not possible to directly locate a particular file record • Data dependent on the programs that used the data • Program data dependence

  11. 1970s Data Management • Batch processing gives way to on line transaction processing • Files stored on disk rather than tape • Any record can be located in the same amount of time • Technologies • Indexed Sequential Access Method (ISAM) • Virtual Sequential Access Method (VSAM) • Direct Access files • Use a hashing function to derive record keys

  12. 1980s Data Management • Databases are becoming commonplace • Personal computer databases are evolving • DBase • R-Base

  13. 1990s Data Management • Huge data stores and transaction processing capabilities • Distributed databases • Object-oriented databases • 6 Million+ transactions per second

  14. Realities of a DBMS • Data centric rather than application centric • Can be a repository for all an organization’s data • Databases tend to be centralized • Queries get data from a DBMS • SQL is the standard query language • Report generators create printed and Web-based reports • Applications interface with DBMS

  15. Types of Databases • Database models include: • Hierarchical database model – A tree-based structure • Network database model – Mathematically, a directed graph • Relational database model – stores information in the form of logically related two-dimensional tables • Object-oriented databases

  16. Elements of a Database • Logical view and physical view • Users see and work with the logical view • Physical view is controlled by the database management system itself

  17. Entities and Attributes • Relational databases store information in tables (entities) • Customer / order / product • Tables contain fields (attributes) • Customer name, address

  18. Keys • Each table has a primary key that uniquely identifies each record • Natural keys have some meaning (stock symbol) • Artificial keys have no intrinsic meaning (your R number) • Foreign keys are used to link tables in one-to-many relationships

  19. Database Interaction

  20. Advantages of an RDMS (Scalability) • Database can scale to the terabyte or petabyte range • NSA maintains 1.9 trillion telephone call records • Large databases can span several servers and storage devices

  21. Advantages of an RDBMS (Redundancy) • Databases can be configured to write duplicate (redundant) information • Citibank • Journaling and checkpointing are supported

  22. Advantages of an RDBMS (Integrity) • Relational integrity constraints are rules that apply to the relationships between tables • Business integrity constraints enforce business rules • Not really a part of the DBMS itself

  23. Advantages of an RDBMS (Information Security) • A DBMS supports advanced access rights • By table and fields • By time of day • By location • By row information

  24. Data-driven Web Sites • Nearly all transactional Web sites rely on a database • Amazon • Your bank • Any shopping cart application • Ebay or Craig’s List • Facebook and You Tube

  25. Database Integration • Databases often need to be integrated • Because of mergers and acquisitions • Because of organizational changes • We are referring to connections to multiple databases

  26. Data Warehouses (Introduction) • Central source for clean data • May contain internal or external data • Use to spot hidden patterns in data • May be integrated with operational database • Parts of a data warehouse are called datamarts • Data warehouses contain an analytical component

  27. Cleansing Data • Data is often obtained from a myriad of sources • External lists • Internal databases • Other databases • This data must be cleansed and sanitized to remove • Redundancy / errors / etc…

  28. Data Warehouses (Illustration)

  29. Multidimensional Analysis • Data are often analyzed as 3-dimensional cubes • Cubes are then ‘sliced and diced’ to look at various layers

  30. Multidimensional Analysis (Illustration)

  31. The cost of Perfect Information

  32. Database Design (Introduction) • In the systems process, we design before we implement • Requirements specification • Conceptual design • Logical design • Physical Design

  33. Database Design Tools • Unified Modeling Language (UML) • Visio • Rational Rose • Entity relationship diagrams describes relationships between data • Normalization eliminates redundant data

  34. Database Management HR • Database administrators • Data managers • Programmers and systems analysts • Data security

  35. BUSINESS INTELIGENCE / DATA MINING

  36. Business Intelligence (Introduction) • Simply put, it’s internal and external data used to support better decision making • It’s challenging to sift through the mountains of data • It requires cross-functional collaboration between systems • More in the next chapter but we use ERP systems to improve business intelligence

  37. Business Intelligence (Industries) • BI applies to all industries • Retail and sales • Understanding procurement and distribution (SCM) / customers (CRM) • Banking • Understand credit worthiness / fraud behavior • Insurance • Forecast claim risk and understand at – risk customers

  38. Business Intelligence (Industries) • Airlines • Routing planes / minimize turnaround time (Southwest) • Marketing • Demographics • Sell based on known customer behavior (Harrah’s) • Amazon

  39. Business Intelligence (Levels) • Operational • Day-to-day operations (building a Dell) • Tactical • Short term (Dell ordering supplies) • Strategic • Long term organizational goals • The systems that provide BI typically do so at all levels

  40. BI Levels (Illustration)

  41. BI and Latency • From the time of acquisition, how long does it take to analyze (analysis latency) • Time to make a decision based on the analysis • E-transactions significantly reduce latency

  42. Data Mining (Introduction) • Data gets mined (analyzed) from data contained in a data warehouse or data mart • Specialized tools are used to analyze data for ‘interesting nuggets’ • Ways to mine • Drill down (general to specific) • Drill up (specific to general)

  43. Data Mining (Clustering) • Cluster analysis groups data by trait or traits • Examples • Don’t drink the water in Fallon • Segment customers by zip codes

  44. Data Mining (Association) • Answers the question “What traits are associated with other traits” • When I stay at Harrah’s, • I gamble • I eat at the Sage room • When I stay in Vegas, • I gamble more

  45. Data Mining (Statistical Analysis) • It’s basic statistics • Analysis of variance • Correlation coefficients • Etc…

  46. BI Benefits • We can understand what’s happening inside and outside a department • Sales knows about product inventory levels and production schedules • Production knows about sales and sales forecasts • Finance knows about the sales forecasts too • This information is provided in near real time

  47. Quantifying BI • Some benefits can be clearly quantified • Costs went down • Productivity increased • Inventory levels were optimized 10% • Some are indirectly quantified • Some benefits are intangible • Sometimes, we get unexpected results

More Related