Data Warehouse Yong Shi CSE DEPARTMENT
Strategic delivery of information • The current Situation The never-ending quest to access any information, anywhere, anytime. • The problem Data is scattered in many types of incompatible structures.
Analytical processing requirements • Four levels of analytical processing: 1. Simple queries and reports 2. The ability to do “what if” processing 3. Step back and analyze what has previously occurred to bring about the current state of date 4. Analyze what has happened in the past and what needs to be done in the future for a specific change
Information data superstore(IDSS) • Definition: The architecture needed to support the far-ranging requirements of the four levels of analysis. • Also called super data warehouse • Data warehouses is not an end of themselves but merely a step on the path to the information data super store
Why need for a separate environment • The use of operational systems v.s data warehouse • The data’s characteristics • The type of access
A strategy for building a data warehouse • Need indicators • Action steps • Three-stage data warehousing processing: model build deploy (understand) (establish) (implement)
Organizational and cultural issues • Cultural imperatives • Success criteria • Satisfy users’ requirements • Make a significant contribution to the success of the business • The users accept and actively use it • The benefits are not exceeded by the costs • An adequate budget must be in place
Organizational and cultural issues • Success criteria(continued) • The implementation of the data warehouse must not cause other problems that overshadow the benefits • A reasonable schedule must be established
Organizational and cultural issues • End user(client) • Strategic architecture • User liaison • End-user support • Data analyst • Security office • Data administration
Organizational and cultural issues • Database administration • Choosing the initial data and department • Establishing an infrastructure • Training users • Change in the power structure
End Users • A crucial part of the project • Gathering requirements and managing expectations • Cost justification process • Design reviews • User perspective • User training
A technical architecture for DW Data Manager Component Warehouse Data Data Delivery Component External Data Source Data Data Acquisition Component Data Access Component Middleware Component Information Directory Component Warehouse Data Design Component External Data Management Component
Data Quality • Why is data quality important? Data is a critical issue It will limit the ability of the end users to make informed decision. It has a profound effect on the image of the enterprise. The poor one will make it difficult to make major changes in an organization.
Data Quality • What is data quality? • The data is accurate • The data is stored according to data type • The data has integrity • The data is consistent • The databases are well designed • The data is accurate • The data is stored according to data type • The data has integrity • The data is consistent • The databases are well designed
Data Quality • The data is not redundant • The data follow business rules • The data corresponds to established domains • The data is timely • The data is well understood
Data Quality • The data satisfies the needs of the business • The user is satisfied with the quality of the data and the information derived from that data • There are no duplicate records • Data anomalies
Data Quality • Assessment of existing data quality • Programs that abnormally terminate with data exceptions • Clients who experience errors/anomalies • Clients who do not know or are confused about what the data actually means • Data that cannot be shared due to lack of integration
Data Quality • What data should be improved? The energy should be spent on data where the quality improvement will bring an important benefit to the business. We can ignore unimportant data and obsolete data. Other criteria: improve those which can be fixed and kept clean.
Data Quality • Purification process • Determine the importance of data quality to the organization • Identify the enterprise’s most important data and evaluate the quality. • Determine users’ and owners’ perception of data quality. • Prioritize which data to purify. • Assemble and train a team to clean the data. • Select tools to aid in the purification process, etc.
Data Quality • Data quality case • Lesson1: If those entering the data have a stake in the data being incorrect, the data will be incorrect. • Lesson2: Reports may show desired results, but the reports may be highly inaccurate.
Directory/Catalog • The challenge Providing short-term benefit without disabling broader long-term information handling solutions. Getting data into a warehouse is only half of the process.
Security in the data warehouse • Basic security concepts • Physical security • Stand-alone or shared security • Remote access