The Data Warehouse and Technology • The issue of volumes of data is so important that it pervades all other aspects of data warehousing.
Technological requirements for the data warehouse • Managing large amount of data • Managing multiple media • Index/Monitor data • Interfaces to Many Technologies • Programmer/Designer Control of Data placement • Parallel storage/Management of data • Language Interface • Efficient Loading of Data • Efficient Index Utilization • Compaction of Data • Compound keys • Variable-length data • Lock Management • Index Only Processing • Fast Restore
DBMS Types and the Data Warehouse • Data warehouse manage massive amount of data because they contain the following: • Granular, atomic detail • Historical information • Summary as well as detail
Multidimensional DBMS and the Data Warehouse • Consider the difference between the multidimensional DBMS (m-DBMS) and the data warehouse (D/W) • The D/W holds massive amount of data; the m-DBMS holds at least an order of magnitude less data • The D/W is geared for a limited amount of flexible access; the m-DBMS is geared for very heavy and unpredictable access and analysis of data • The D/W contains data with a very lengthy time horizon – from 5 to 10 years; the m-DBMS holds a much shorter time horizon of data • The D/W allows analyst to access its data in a constrained fashion; the m-DMBS allow unfettered access • Instead of the D/W being housed in a m-DBMS, the m-DBMS and the D/W enjoy a complementary relationship
Multidimensional DBMS come in several flavor • The relational foundation for multidimensional DBMS data marts: • Strengths: • Can support a lot of data • Can support dynamic joining of dta • Has proven technology • Is capable of supporting general-purpose update processing • If there is no known pattern of usage of data, then the relational structure is as good as any other • Weaknesses: • Has performance that is less than optimal • Cannot be purely optimized for access processing
Multidimensional DBMS come in several flavor • The cube foundation for multidimensional DBMS data marts: • Strengths: • Performance that is optimal for DSS processing • Can be optimized for very fast access of data • If pattern of access of data is known, then the structure of data can be optimized • Can easily be sliced and diced • Can be examined in many ways • Weaknesses: • Cannot handle nearly as much data as standard relational format • Does not support general-purpose update processing • May take a long time to load • If access is desired on a path not supported by the design of the data, the structure is not flexible • Questionable support for dynamic joins of data
Context and Content • Three types of Contextual information must be managed: • Simple contextual information • Complex contextual information • External contextual information • Simple contextual information relates to the basic structure of data itself, and includes such things as these: • The structure of data • The encoding of data • The naming conventions used for data • The metrics describing the data, such as: • How much data there is • How fast the data is growing • What sectors of the data are growing • How the data is being used • Simple contextual information has been managed in the past by dictionaries, directories, system monitors, and so forth
Context and Content • Complex contextual information describes the same data as simple contextual information, but from a different perspective. This type of information address such aspects of data as these: • Product definition • Marketing territories • Pricing • Packaging • Organization structure • Distribution • Complex contextual information is some of the most useful and, at the same time, some of the most elusive information there is to capture.
Context and Content • External contextual information is information outside the corporation that nevertheles plays an important role in understanding information over time. Some examples of external contextual information include the following: • Economic forecasts: • Inflation • Financial trends • Taxation • Economic growth • Political information • Competitive information • Technological advancements • Consumer demographic movements • External contextual information says nothing directly about a company but says everything about the universe.