1 / 49

Decision Support Technology

Decision Support Technology. DSS Reference Architecture. Language System. Problem Processing System. Knowledge System. Presentation System. Outline. Knowledge system technology data management data warehousing Data marts online analytic processing (OLAP) ROLAP, MOLAP, WOLAP.

lucas
Télécharger la présentation

Decision Support Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Support Technology

  2. DSS Reference Architecture Language System Problem Processing System Knowledge System Presentation System

  3. Outline • Knowledge system technology • data management • data warehousing • Data marts • online analytic processing (OLAP) • ROLAP, MOLAP, WOLAP

  4. Evolution of DSS • Transaction Processing Systems (TPS) • Operational data stores and OLTP • Batch reports, hard to find and analyze information, inflexible and expensive, reprogram every new request (circa 60’s) • MIS • Management reporting from transactions in TPS • Still inflexible, not integrated with desktop tools (circa 70’s) • DSS • Combine data with analytic models or expert rules • Integration with desktop tools (80’s)

  5. Evolution of DSS • Data Warehousing • Data integrated after (cleaning and scrubbing) from multiple sources (both internal and external to the organization) • OLAP is the technology used to study the data in terms of operations on a multi-dimensional data set • Data warehousing also supports processing of data by analytic methods and permits data mining (90’s)

  6. Applications • Retail - inventory management, promotions • Manufacturing - order shipment • Insurance – policy and claims tracking • Telecommunications - call analysis • Financial – account tracking • CRM/eCRM – customer profiling, clickstream analysis • Healthcare – disease management, patient and physician profiling

  7. Disease Management Programs • Address health quality of population by identifying at-risk members and applying prevention programs • Challenge: identifying at-risk members • Data warehouses can help with this effort • Aetna U.S. Healthcare • Members with certain ailments are flagged using an algorithm that examines member’s diagnoses, procedures, laboratories, and pharmaceuticals • Data gathered from medical and pharmaceutical claims, member and provider profiles (nearly 3 terabytes) • Results: reduction in frequency of acute asthmatic episodes, improvements in vaccination rates

  8. Data Model • A representation scheme with which to describe data: data relationships, data semantics, and consistency constraints. • Examples • ER (entity relationship model) • Relational model • Object-oriented model • References: • http://www.smartdraw.com/resources/centers/software/erd.htm (ER model) • http://www.fgcu.edu/support/office2000/index.html (MS Office product tutorials) • http://mis.bus.sfu.ca/tutorials/MSAccess/tutorials.html (MS Access tutorial)

  9. evaluates physician patient E-R schema: A simple example m n -Modeling the structure of data, not the processing of it

  10. ER Schema: a detailed example

  11. Relational model • Primary model used today for data-processing applications • Database systems like Oracle, Sybase, Informix, MS SQL server etc. support this model • Based on a well understood theoretical model

  12. An Example Example of a relational schema: Physician(doc_id, d_name, specialty) Patient(p_id, p_name) Evaluates(doc_id, p_id, date, diagnosis)

  13. Essential features of the relational model • A relational model schema consists of relations or tables • Each table has a set of fields (columns) that are related to one another • One or more fields whose values determine the value of other fields are called keys • Tables are normalized in order to remove redundancies

  14. New Data Types • Text, images, video, audio, time series, spatial, …. • Other, more exotic data types like fingerprints (an IBM Extender) and face recognition (an Informix DataBlade). • Offered by IBM, Informix, Oracle

  15. Object-oriented Databases • Objects belong to object classes determined by the structure (variables) and behavior (methods) of the object - not easy to represent in a relational database • Methods that impact the state of the variables of the object are encapsulated within the object and facilitate interaction between objects • eg. Object “bank account” modifies “balance” by “amount” • Potentially reduced development and maintenance time

  16. From Databases to Transaction Processing Source: Jim Gray,MSFT The real state is represented by an abstraction, called the database, and the transformation of the real state is mirrored by the execution of a program, called a transaction, that transforms the database.

  17. Examples • Point of sale systems • credit card transactions • ATM machines • all withdrawals and deposits • E-commerce web sites • Health Care • medical records, billing

  18. Databases for Decision Support • Transaction Processing systems are optimized for performance • Data they capture are too detailed to be of use for decision support purposes • Online Analytical Processing (OLAP) imposes very different demands on databases than does Online Transaction Processing (OLTP)

  19. Data Analysis: An Example % Inc. 1997 1998 52.4 Mill 15.41 45.4 Mill Reimbursements 62.50 3.2 Mill 5.2 Mill Net Profits

  20. Data Analysis: A Hospital Example % Inc. 1997 1998 52.4 Mill 15.41 45.4 Mill Reimbursements 62.50 3.2 Mill 5.2 Mill Net Profits Reimb. by region North 4.17 7.5 Mill 7.2 Mill South 18.4 Mill 13.4 Mill 37.31 East 17.4 Mill 18.4 Mill -5.43 9.1 Mill 6.4 Mill 42.18 West

  21. Reimb. - South by specialty Specialty A 1.90 2.70 Mill 2.65 Mill Specialty B 7.10 Mill 6.45 Mill 10.08 Specialty C 4.0 Mill 3.1 Mill 29.03 Specialty D 4.60 Mill 1.2 Mill 283.33 Data Analysis: An Example Reimb. By region North 4.17 7.5 Mill 7.2 Mill South 18.4 Mill 13.4 Mill 37.31 East 17.4 Mill 18.4 Mill -5.43 9.1 Mill 6.4 Mill 42.18 West

  22. Data Analysis: An Example % Inc. 1997 1998 52.4 Mill 15.41 45.4 Mill Reimbursements 62.50 3.2 Mill 5.2 Mill Net Profits Profits by region North -43.18 0.50 Mill 0.88 Mill South 2.60 Mill 1.12 Mill 132.1 East 0.13 Mill 1.1 Mill -88.18 1.97 Mill 0.1 Mill 1870 West

  23. Profits-South by specialty Specialty A 0.10 Mill 0.20 Mill Specialty B 1.20 Mill 0.60 Mill Specialty C 0.40 Mill 0.22 Mill Specialty D 0.90 Mill 0.10 Mill Data Analysis: An Example % Inc. 1997 1998 52.4 Mill 15.41 45.4 Mill Reimbursements 62.50 3.2 Mill 5.2 Mill Net Profits -50.00 100.0 81.82 800.0

  24. Integration System World Wide Web Heterogeneous Database Integration Personal Databases Digital Libraries Scientific Databases • Collects and combines information from disparate sources • Provides integrated view, and a uniform user interface • Supports sharing of data between entities

  25. Why look at data in this way? • What would be the demand for services (forecasting)? • Who are our key customers/patients, and • What are the margins/outcomes? (profitable customers/satisfied patients) • How do we market to them/treat them? • What pricing/treatment strategy is desirable? • What are their preferences? • What type of customer/patient services are required? • What services when packaged result in higher/better sales/revenues/margins/outcomes, efficient workflow? • Which promotion/patient education/counseling works or does not work and why? • What is the inventory/patient turnover? • Which channel/technology is more effective/profitable? • Why do margins/outcomes differ from one place to another or one patient to another?

  26. Customer Relationship Management ANALYZE DISCOVER Trends in market Selling opportunities Opp. For improvements Customer base Profitability Buying pattern Support pattern Productivity Policies & Procedures Marketing policies Support procedures TAKE ACTION Source: Pilot Software

  27. Business Intelligence Software applications, technologies, and analytical methodologies that perform data analysis Often used as a broad term that includes OLAP, data mining, query, reporting tools and technologies

  28. Business Intelligence Loop Business Strategist OLAP Data Mining Reports Decision Support Data Storage Data Warehouse Extraction, Transformation, & Cleansing CRM Clinical IS Pharmacy Lab

  29. Data Warehouse • A decision support database that is maintained separately from the organization’s operational databases. • A data warehouse is a • subject-oriented, • integrated, • time-varying, • non-volatile collection of data that is used primarily in organizational decision making (W.H. Inmon)

  30. Why Separate Data Warehouse? • Performance • Operational databases are designed & tuned for known transactions & workloads. • Complex OLAP queries would degrade performance for transactions. • Special data organization, access & implementation methods needed for multidimensional views & queries • Function • decision support requires historical data (up to 5 to 10 years of data) • consolidation of data from many operational systems and external sources • data quality considerations (semantic and measurement issues are resolved)

  31. Local vs Central Data Repository Local Central Other Data Sources 2 3 Data Warehouse Repository Knowledge Generation Forward Data 4 1 Knowledge Generation Local Rules Base Central Data HIS Data Local Data Feedback Option

  32. Warehouse Database Schema • ER design techniques not appropriate • Design should reflect multidimensional view • Star Schema • Snowflake Schema • Fact Constellation Schema

  33. Multidimensional Data Model • Database is a set of facts (points) in a multidimensional space • A fact has a measure dimension • quantity that is analyzed, e.g., sale, budget • A set of dimensions on which data is analyzed • e.g. , store, product, date associated with a sale amount • Dimensions form a sparsely populated coordinate system • Each dimension has a set of attributes • e.g., owner, city and county of store • Attributes of a dimension may be related by partial order • Hierarchy: e.g., street > county >city • Lattice: e.g., date> month>year, date>week>year

  34. Example: Patient profiling • A healthcare organization needed a longitudinal view of patients, including trends of services to patients • Model • Facts include Healthcare (e.g., diagnosis, procedure), Financial (e.g., amount billed, number of claims), Resources (e.g., number of bed-days, inpatient and outpatient visits) • Dimensions include Time, Provider, Claim Type, Demographics, Encounter Type, Diagnosis and Procedure, Person, Organization • Questions answered by system: • Which individuals are eligible for services but not obtaining them? • Which individuals are registered for services, but not receiving preventive healthcare?

  35. Example of a Star Schema Order Product Order No Order Date ProductNO ProdName ProdDescr Category CategoryDescription UnitPrice Fact Table Customer OrderNO SalespersonID CustomerNO ProdNo DateKey CityName Quantity Total Price Customer No Customer Name Customer Address City Date DateKey Date Salesperson City SalespersonID SalespersonName City Quota CityName State Country

  36. Star Schema • A single fact table and a single table for each dimension • Every fact points to one tuple in each of the dimensions and has additional attributes • Does not capture hierarchies directly • Straightforward means of capturing a multiple dimension data model using relations

  37. Example of a Snowflake Schema Order Product Category Order No Order Date ProductNO ProdName ProdDescr Category Category UnitPrice CategoryName CategoryDescr Fact Table Customer OrderNO SalespersonID CustomerNO ProdNo DateKey CityName Quantity Total Price Customer No Customer Name Customer Address City Date Month DateKey Date Month Month Year Salesperson Year Year SalespersonID SalespersonName City Quota City State CityName State Country StateName Country

  38. Snowflake Schema • Represent dimensional hierarchy directly by normalizing the dimension tables • Easy to maintain • Saves storage, but may reduce effectiveness of browsing (Kimball)

  39. Fact Constellation Sales Fact Table Shipping Fact Table Product Dimension Store Dimension

  40. Fact Constellation • Multiple fact tables share dimension tables. • This schema is viewed as collection of stars hence called galaxy schema or fact constellation. • Sophisticated applications require such schema.

  41. Data Warehouse vs. Data Marts • Enterprise warehouse: collects all information about subjects (customers, products, sales, assets, personnel) that span the entire organization. • Requires extensive business modeling • May take years to design and build • Data Marts: departmental subsets that focus on selected subjects: Marketing data mart: customer, products, sales. • Faster roll out, but complex integration in the long run.

  42. Virtual Warehouse • Views over operational databases • Materialize some summary views for efficient query processing, easier to build, requisite excess capacity on operational database servers

  43. OLAP servers Data Warehouse Architecture Clinical System Payroll System (e.g., AS400) Meta-Data Data Mining Tools Oracle Financials on HP 9000 Data Warehouse Billing System Transformation Integration Access, Files (Industry Reports) Other Internal Data External Data Excel Web Other

  44. Extraction, Transformation, & Load (ETL) • ETL is a set of tools and techniques used to populate a data warehouse • Extraction • Extract data from sources (e.g., operational DBMSs, file systems, Web pages) • Transformation • Clean data • Convert from legacy/host format to warehouse format (e.g., convert “surname” to “last name”)

  45. Extraction, Transformation, & Load (ETL) • Load • Sort, summarize, consolidate, compute views, check integrity, build indexes, partition • Huge volumes of data to be loaded, yet small time window (usually at night) when the warehouse can be taken off-line • Techniques: batch, sequential load often too slow; incremental, parallel loading techniques may be used • Refresh • Propagate updates from sources to the warehouse • When to refresh - on every update, periodically (e.g., every 24 hours), or after “significant” events • How to refresh – full extract from base tables vs. incremental techniques

  46. Metadata • Types of Metadata • Administrative metadata • source databases and their contents • gateway descriptions • warehouse schema, view & derived data definitions • dimensions, hierarchies • pre-defined queries and reports • data mart locations and contents • data partitions • data extraction, cleansing, transformation rules, defaults • data refresh and purging rules • user profiles, user groups • security: user authorization, access control

  47. Metadata … • Types (continued) • Business data • business terms and definitions • ownership of data • charging policies • Operational metadata • data lineage: history of migrated data and sequence of transformations applied • currency of data: active, archived, purged • monitoring information: warehouse usage statistics, error reports, audit trails • Tools: • Platinum Repository (Computer Associates) • Meta Directory (Information Builders)…

  48. The Complete Decision Support System (Source: Franconi) Information Sources Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients (Tier 3) e.g., MOLAP Analysis Semistructured Sources Data Warehouse serve extract transform load refresh etc. Query/Reporting serve e.g., ROLAP Operational DB’s serve Data Mining Data Marts

  49. Three-Tier Architecture • Warehouse database server • Almost always a relational DBMS; rarely flat files • OLAP servers • Relational OLAP (ROLAP): extended relational DBMS that maps operations on multidimensional data to standard relational operations. • Multidimensional OLAP (MOLAP): special purpose server that directly implements multidimensional data and operations. • Clients • Query and reporting tools. • Analysis tools (excel) • Data mining tools (e.g., trend analysis, prediction)

More Related