1 / 17

DATA WAREHOUSING AND DATA MINING

DATA WAREHOUSING AND DATA MINING. MARK VROOMAN AND AMINAT ONI PROJECT REPORT KENNESAW STATE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE CS8630- DATABASE ADMINISTRATION.

cwen
Télécharger la présentation

DATA WAREHOUSING AND DATA MINING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATA WAREHOUSING AND DATA MINING MARK VROOMAN AND AMINAT ONI PROJECT REPORT KENNESAW STATE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE CS8630- DATABASE ADMINISTRATION

  2. Beazerhomes is a builder that has properties all over the United States. They have been building homes and apartments since 1995 and they feel the need to have a data warehouse to help record and store what the selling and renting trends and patterns for their buildings are and have been. This will help top management make decisions such as where to build more properties, which properties to sell and which to rent out, what incentives make their buildings high sellers, and what prices to put on the buildings to maximize profit and at the same time, encourage buyers. The data warehouse will have top management access that will include all departments, middle management access that will cover areas they manage, and staff access for their individual sections of work. It will also allow for simple queries on past and present data. Purpose of data warehouse

  3. All the data from all the OLTP systems can be integrated and made consistent for easier query. Data present in the data warehouse cannot be modified, but can only be updated and deleted. This ensures that the data gotten straight from the OLTP systems are the same ones used in business analysis. Data stored in a data warehouse contains more detailed information on a more historical basis. This makes for a better analysis of home-buying and selling trends and patterns. Benefits of Data warehouse

  4. Data warehouse design - Dimensional modeling is a technique used in describing the database component of a data warehouse. It organizes data in a form that corresponds to the way users query data. A dimensional model includes one more fact tables and a series of smaller tables called dimension tables. Different schemas can be used for dimensional modeling. These are star schema, snowflake schema, and starflake schema. In a star schema, the fact table is surrounded by dimension tables containing reference data that can be denormalized. It can also be described as each dimension table being linked to the primary key fact table. - The data warehouse design chosen by Beazerhomes uses the star schema for its dimensional modeling. The fact tables; Lease, PropertySale, PropertyMaintenance, PropertyViewing, and Advert contain the primary keys for the dimension tables

  5. Dimension and fact tables

  6. OLTP System - The OLTP (Online Transaction processing) system is the daily processing database that contains data used daily, on a short term basis. The data from the OLTP is what is stored in the data warehouse. The “DreamHome” contains tables from the OLTP for BeazerHomes and shows information on how each property is sold or rented. The diagram below contains the fact and dimension tables for the OLTP system, “DreamHome”:

  7. Data warehouse • The data warehouse contains historical and long-term information from the OLTP system. • The data will be loaded via an operational datastore. The data will be extracted from OLTP database by the operational datastore after the end of business hours on Friday (assuming a standard business week). The data will then be cleansed and its integrity will be verified. The data will then be loaded into the data warehouse by the operational datastore via the warehouse manager. Prior to the loading of the data, an incremental backup of the data warehouse will be performed. This will ensure a good rollback point if issues occur with the data loading. By carrying these activities out during non-business hours, there will be increased availability of resources and will not slow down operations during normal hours. This also allows time for a recovery if necessary.

  8. E-R Diagram

  9. E-R Description • The above diagram is the entity-relationship diagram for the data warehouse, that shows how each dimension or fact table interacts with the other tables. These are described as follows: • Promotion uses Advert displayed in the Newspaper, which recommends PropertyForRent and promotes PropertyForSale. • - Advert describes PropertyForSale. • - ClientRenter holds Lease and seeks PropertyViewing. • - Branch registers ClientRenter and offers PropertyForRent, sells PropertyForSale, has Staff, and contacts ClientBuyer. • - ClientBuyer requests PropertyViewing and agrees to PropertySale. • - PropertySale is for PropertyForSale. • - PropertyForRent is placed in Advert and takes PropertyViewing. It is also leased via Lease and requires PropertyMaintenance. • - Staff oversees PropertyForRent, manages PropertyForSale, and attends to PropertyMaintenance.

  10. - Owner owns provides (rent to) PropertyForRent or owns PropertyForSale  - PropertyViewing views PropertyForSale.

  11. SAMPLE QUERIES SELECT COUNT (timeID) FROM PropertySale Where timeID = (SELECT (timeID) FROM Time WHERE year BETWEEN ‘2001’ AND ‘2003’); This query will provide a count of the number of properties that were sold between the beginning of 2001 and the end of 2003. The BETWEEN command includes the endpoints of the range.

  12. SAMPLE QUERIES, CONTD. SELECT SUM (sellingPrice) AS TotalRevenue FROM PropertySale Where timeID = (SELECT (timeID) FROM Time WHERE year BETWEEN ‘2001’ AND ‘2003’); This query will give the total of sales (TotalRevenue) for the dates from the beginning of 2001 to the end of 2003.

  13. Data Dictionary The data dictionary is a catalog of data files in the data warehouse. It also contains the characteristics of each file and is only used for the management of the files. An excerpt from Beazerhomes’ data warehouse can be seen below. Although a data dictionary does not contain data, without one, a management system will not be able to access data information from the warehouse.

  14. A data warehouse is a beneficial database storage system for organizations that want to have a competitive edge in their industry through using past and present data to view positive and negative trends and patterns in order to improve their technology, marketing, financial, and sales. To further make the benefits of a data warehouse more advantageous, data mining tools should be used to uncover unknown patterns and trends that may have been lost in large databases. Using data mining to reveal more complex queries, reports, and patterns will give an even more beneficial advantage to organizations.

More Related