200 likes | 320 Vues
Data Warehousing - 3. ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension tables. Location Dimension LocationCode State City. CustomerRating Dimension Rating Description. FactTable LocationCode
E N D
Data Warehousing - 3 ISYS 650
Snowflake Schemaone or more dimension tables do not join directly to the fact table but must join through other dimension tables. Location Dimension LocationCode State City CustomerRating Dimension Rating Description FactTable LocationCode PeriodCode Rating PID Qty Amount Can group by State, City Period Dimension PeriodCode Year Quarter Product Category CategoryID Description Product Dimension PID Pname CategoryID
Multiple Fact Tables • A star schema can contain multiple fact tables. • Case 1: Unrelated facts: • Multiple fact tables exist because they contain unrelated facts; for example, invoices and sales. • Case 2: Multiple fact tables are often used to hold various levels of aggregated (summary) data, particularly when the amount of aggregation is large; for example, daily sales, monthly sales, and yearly sales. • Improve performance
Surrogate Key • A surrogate key is the primary key for a dimension table and is independent of any keys provided by source data systems. • Surrogate keys are created and maintained in the data warehouse and should not encode any information about the contents of records; automatically increasing integers make good surrogate keys. • The original key for each record is carried in the dimension table but is not used as the primary key. • Surrogate keys provide the means to maintain data warehouse information when dimensions change.
Slowly Changing Dimensions • A characteristic of dimensions is that dimension data is relatively stable. • Of particular concern is the potential effect of a change to a dimension attribute on how historical data is tracked and summarized. • "Slowly changing dimensions" is the customary term used for discussions of issues associated with the impact of changes to dimension attributes.
Three Ways of Dealing with Slowly Changing Dimension • Type 1: Overwrite the dimension record. • Type 2: Add a new dimension record. • Type 3: Create new fields in the dimension record. • Not recommended
Type 1 • Type 1 changes cause history to be rewritten, which may affect analysis results if an attribute is changed that is used to group data for summarization. • Changes to a dimension attribute that is never used for analysis can be managed by simply changing the data to the new value.
Type 2 • Type 2 changes cause history to be partitioned at the event that triggered the change. Data prior to the event continues to be summarized and analyzed as before; new data is summarized and analyzed in accordance with the new value of the data. • Surrogate keys on the dimension table are required for Type 2 solutions.
Extraction in Data Warehouseshttp://download.oracle.com/docs/cd/B10501_01/server.920/a96520/extract.htm • Full Extraction • The data is extracted completely from the source system. • Incremental Extraction • At a specific point in time, only the data that has changed since a well-defined event back in history will be extracted.
Change Data Capture • Timestamps • The tables in some operational systems have timestamp columns. • Partitioning • Some source systems might use Oracle range partitioning, such that the source tables are partitioned along a date key, which allows for easy identification of new data. • Triggers • Triggers can be created in operational systems to keep track of recently updated records.
Operational Data Store, ODS • The ODS is designed to support tactical decision-making. • It has elements of both data warehouse and a transaction system. • Like a data warehouse, the ODS typically contains data consolidated from multiple systems and grouped by subject area. • Like a transaction system, the ODS may be updated by business users, and contains relatively little historical data. • ODS: 30 to 60 days of information • Data warehouse typically contains years of data. • ODS is designed to quickly perform relatively simply queries on smaller volumes of data such as finding orders of a customer or looking for available items in the retails store. While data warehouse queries typically are complex and on high volumes of data.
A business case for an operational data store • Customer Call Center • Call center operators have little need for broad analytical queries that reveal trends in customer behavior. • Need up-to-date information about all transactions involving the complaining customer. • Data may be a replicated mirror of the transaction system, or data warehouse star schema.
Data Warehouse Appliance • A data warehouse appliance is a combination hardware and software product that is designed specifically for analytical processing. An appliance allows the purchaser to deploy a high-performance data warehouse right out of the box. • It comes with its own operating system, storage, database management system and software. It uses massively parallel processing and distributes data across integrated disk storage, allowing independent processors to query data in parallel. • Data warehouse appliances use Open Database Connectivity (ODBC), Java Database Connectivity (JDBC), and OLE DB interfaces to integrate with other extract-transform-load (ETL) tools and business intelligence applications.
SaaS BI/DW • Growing demand for SaaS BI/DW tools: • Especially among small and medium-sized businesses that lack the IT infrastructure to support an on-premise BI installment. • SaaS BI is also useful for quick, one-off BI projects • As workspaces for power users that want to experiment with more advanced analytics. • Drawbacks: • Integrating large data volumes from the cloud creates latency problems. • It makes it more difficult to control the data and maintain "a single version of the truth."
Bitmap Index 1. Rows - possible values of the attribute 2. Columns - table rows 3. Bit indicates whether the attribute of a row has the values The bitmap index is used where the values of a field repeats very frequently(not many distinct values), it is not used for primary key index.
Bitmap Index for Data Warehouse’s Fact Table • The number of rows in a dimension tables is much smaller than the number of rows in the fact table. • The dimension keys have a relatively small number of distinct values compared to the number of rows in the fact table.