1 / 33

Data Warehouse [ Example ]

Data Warehouse [ Example ]. J. Han and M. Kamber , Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN 1558604898 . OLTP. it design for optimal transaction. OLAP. It design to give overview analysis of what happened! It is uses to built report answer the following :

waite
Télécharger la présentation

Data Warehouse [ Example ]

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouse [ Example ] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001, ISBN 1558604898 Data Mining: Concepts and Techniques

  2. OLTP it design for optimal transaction Data Mining: Concepts and Techniques

  3. OLAP • It design to give overview analysis of what happened! • It is uses to built report answer the following : • Q1: who the supervisor that gave most discount? • Q2:in which Zip code did product a sell the most? To answer the questions, OLAP Cube are created. Data Mining: Concepts and Techniques

  4. Example Assume we have made a record of the weather conditions during a two-week period, along with the decisions of a tennis player whether or not to play tennis on each particular day. We have values of four independent variables (outlook, temperature, humidity, windy) and one dependent variable (play) Consider our data stored in a relational table as follows: Data Mining: Concepts and Techniques

  5. Example (cont.) Data Mining: Concepts and Techniques

  6. Example (cont.) • By querying a DBMS containing the above table we may answer questions like: • What was the temperature in the sunny days? {85, 80, 72, 69, 75} • Which days the humidity was less than 75? {6, 7, 9, 11} • Which days the temperature was greater than 70? {1, 2, 3, 8, 10, 11, 12, 13, 14} • Which days the temperature was greater than 70 and the humidity was less than 75? The intersection of the above two: {11} Data Mining: Concepts and Techniques

  7. Example (cont.) OLAP: Using OLAP we can create a Multidimensional Model of our data (Data Cube). For example using the dimensions: time, outlook and playwe can create the following model. Data Mining: Concepts and Techniques

  8. Example (cont.) Obviously here timerepresents the days grouped in weeks (week 1 - days 1, 2, 3, 4, 5, 6, 7; week 2 - days 8, 9, 10, 11, 12, 13, 14) over the vertical axis. The outlookis shown along the horizontal axis and the third dimension playis shown in each individual cell as a pair of values corresponding to the two values along this dimension - yes / no. Thus in the upper left corner of the cube we have the total over all weeks and all outlook values. Data Mining: Concepts and Techniques

  9. Example (cont.) By apply "Drill-down" to our data cube over the time dimension. This assumes the existence of a concept hierarchy for this attribute. We can show this as a horizontal tree as follows: Data Mining: Concepts and Techniques

  10. Example (cont.) Data Mining: Concepts and Techniques

  11. Example (cont.) The drill-down operation is based on climbing down the concept hierarchy, so that we get the following data cube: Data Mining: Concepts and Techniques

  12. Multidimensional data model By using same example and change some values: • play has just two values - yes and no, it can replace them by 1 and 0 • This will allows us to add up values and thus get the total number of days when tennis was played and at the same time the number of days tennis was not played • Rename the day attribute into time, which is more general and will allow us to use other time units (e.g. weeks). Thus we get the following relational table: Data Mining: Concepts and Techniques

  13. Multidimensional data model (cont.) Data Mining: Concepts and Techniques

  14. Concept hierarchies 1- attributes day, temperature and humidity we can group values in subsets and name these subsets as following : Day: all        ______|_________       |                 |    week 1             week 2  _____|_____     _______|_______ | | | | | | |  | | |  |  |  |  | 1 2 3 4 5 6 7  8 9 10 11 12 13 14 Data Mining: Concepts and Techniques

  15. Concept hierarchies (cont.) Temperature:                all      ____________|_____________      |          |               |     hot          mild         cool  _ |___     __|____      ___|____ |  |  |  | |  |  |  |    |  |  |  | 80 81 83 85  70 71 72 75 64 65 68 69 Data Mining: Concepts and Techniques

  16. Concept hierarchies (cont.) Humidity:               all          ___|___________    |                 |       high             normal  ______|_______     ___|____ |  |  |  |  |  |   |  |  |  | 85 86 90 91 95 96   65 70 75 80 Data Mining: Concepts and Techniques

  17. Concept hierarchies (cont.) We may also extend the sets of numbers or replace them with intervals, which will make the hierarchy complete (covering all possible values). For example, humiditymay look like this: all          ____|____      |         |    high      normal     |          |     [85,96]   [65,84] Data Mining: Concepts and Techniques

  18. Concept hierarchies (cont.) 2- For the nominal (non numeric) attributes outlook and windy we define one-level hierarchies, as their values cannot be ordered or grouped. outlook:         all   _______|________ |       |         | sunny  rainy  overcast Data Mining: Concepts and Techniques

  19. Concept hierarchies (cont.) windy:        all        ___|____      |         |    true     false Data Mining: Concepts and Techniques

  20. Data cube • The number of dimensions define the total number of data cubes that can be created. number of elements is 2N elements; N is an number attributes Data Mining: Concepts and Techniques

  21. Data cube (cont.) To create a data cube we have to: 1- Select dimensions, that is select a subset of attributes. • For example, select time and temperature. Thus we will create a two-dimensional data cube. 2- Select levels in the concept hierarchies. • For example, let us select weeks for time and degrees for temperature. 3- Select a measure to populate the cube. This is the attribute whose values will be aggregated across the dimensions (obviously it has to be numeric). • For example, Let us select play. Data Mining: Concepts and Techniques

  22. Data cube (cont.) • The numbers in the internal cells are obtained by adding up the values of the play attribute, where the time and the temperature attribute are equal to the values in the corresponding row and column • For example the value 2 (row 2, column 8) means that tennis was played two days during week 2 when the temperature was 75. By placing the time values in the rows and the temperature values in the columns we get the following cube: Data Mining: Concepts and Techniques

  23. OLAP operations Rollup: assume we want to change the level that we selected for the temperature hierarchy to the intermediate level (hot, mild, cool). Roll up produces the following cube: Data Mining: Concepts and Techniques

  24. OLAP operations (cont.) Drill-down the drill down of the pervious data cube over the time dimension produces the following: Data Mining: Concepts and Techniques

  25. OLAP operations (cont.) Data Mining: Concepts and Techniques

  26. Lattice of cubes, slice and dice operations Lattice : there are five dimension: Time, outlook, temperature, humidity, windy. Data Mining: Concepts and Techniques

  27. Lattice of cubes, slice and dice operations (cont.) • 0-D (apex) cuboids : { all} • 1-D cuboids:{ Time}, {Outlook}, {Temperature}, {Humidity}, { Windy} • 2-D cuboids: { {Time, Outlook}, {Time, Temperature}, { Time, Humidity}, {Time, Windy}, {Outlook, Temperature}, {Outlook, Humidity}, {Outlook, Windy}, {Temperature, Humidity}, { Temperature, Windy}, {Humidity, Windy} } Data Mining: Concepts and Techniques

  28. Lattice of cubes, slice and dice operations (cont.) • 3- D Cuboids :{ { Time, Outlook, Temperature}, {Time, Outlook, Humidity}, {Time, Outlook, Windy}, {Time, Temperature, Humidity}, {Time, Temperature, Windy}, {Time, Humidity, Windy} {Outlook, Temperature, Humidity}, { Outlook, Temperature, Windy}, {Outlook, Humidity, Windy} {Temperature, Humidity, Windy} } Data Mining: Concepts and Techniques

  29. Lattice of cubes, slice and dice operations (cont.) • 4-D cuboids: { { Time, Outlook, Temperature, Humidity}, {Time, Outlook, Temperature, Windy}, {Time, Outlook, Humidity, Windy}, {Time, Temperature, Humidity, Windy} {Outlook, Temperature, Humidity, Windy} } • 5- D cuboids { Time, Outlook, Temperature, Humidity, Windy} Data Mining: Concepts and Techniques

  30. Lattice of cubes, slice and dice operations (cont.) • There are two other OLAP operations that are related to the selection of a cube - slice and dice. Slice : performs a selection on one dimension of the given cube, thus resulting in a subcube. • For example, if we make the selection (temperature=cool) we will reduce the dimensions of the cube from two to one, resulting in just a single column from the pervious tables. So, the result will be as following: Data Mining: Concepts and Techniques

  31. Lattice of cubes, slice and dice operations (cont.) Data Mining: Concepts and Techniques

  32. Lattice of cubes, slice and dice operations (cont.) • The dice operation works similarly and performs a selection on two or more dimensions. • For example, applying the selection (time = day 3 OR time = day 4) AND (temperature = cool OR temperature = hot) to the original cube we get the following subcube (still two-dimensional): Data Mining: Concepts and Techniques

  33. The End Data Mining: Concepts and Techniques

More Related