1 / 40

Introduction to Dimensional Analysis Session 2

M. D. Metadata Solutions. Introduction to Dimensional Analysis Session 2. 5/20/2005. Dan McCreary President Dan McCreary & Associates dan@danmccreary.com (952) 931-9198. Agenda. General introduction to Data Dictionaries that drive Business Intelligence (BI) concepts and terminology

carsten
Télécharger la présentation

Introduction to Dimensional Analysis Session 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. M D Metadata Solutions Introduction to Dimensional AnalysisSession 2 5/20/2005 Dan McCreary President Dan McCreary & Associates dan@danmccreary.com (952) 931-9198

  2. Agenda • General introduction to Data Dictionaries that drive Business Intelligence (BI) concepts and terminology • Understand why Data Dictionaries are so critical in accurate BI • Understand how BI looks at the world in different ways • Understand how data warehouse concepts and data dictionary impact analysis and research

  3. What is a Data Warehouse? • Fast Retrieval • Internally Consistent • Slice and Dice Capability • Easy to “Browse” • Complete and Reliable • Data Quality Controls • GI-GO (Garbage-In, Garbage-Out) Source: Ralph Kimball

  4. Factors Driving Business Intelligence • Computer process and store twice as much data per dollar every 18 months (Moore’s Law) • People can make better decisions if they have tools to quickly see only the data they are interested in seeing • People frequently want to analyze data in new ways that was unanticipated by people creating "canned reports" • Tools can be designed to allow non-technical (non-SQL programmers) to generate their own reports • People have an incredible ability to categorize things base on their properties and attributes but if they don't have consistent definitions of these properties they will not be generating consistent results

  5. The BI Iterative Process Access Data Warehouse • The BI process in an on-going iterative process where the structure of the data warehouse changes based on what data is critical to an organizations business objectives. Analysis BI ProjectManagement Publishing, Change, DataGap Analysis, New DataGathered Insights, Conclusions andFindings

  6. BI Evolution • Shorten the time-to-report interval • Allow users to "browse" data sets interactively • Remove programmers with "backlogs" of reports • Users frequently waited days, weeks for months to get a custom report created Increasing Responsiveness Monthly Green Bar Reports BrowseableGraphical Interface

  7. Dimensions of BI Technical SophisticationRequired Low (analysts) Highly Responsive to "What If" Scenarios Degree ofEnd User Control High (programmers) Few Dimensions few parameters, few filters Many Dimensions many variables

  8. Overlapping Terminology Data Mining Statistical Analysis Data Dictionaries Data Modeling Semantics PatternDiscovery Business Intelligence Data Warehousing Transaction Processing (OLTP) Indexing Dimensional Analysis Aggregates Data Storage (RDBMS)

  9. Key Terms Covered in This Class • Properties • Dimension • Aggregation and Levels • Enumerations of Categorical Data • Labeling Categories • Giving precise definitions to Labels • Dimension Hierarchies and Levels • Cubes • Measures • Filters • Data Warehouse Presentation

  10. Things Have Many "Properties" People are very good at recognizing and sorting things by their properties.

  11. Sorting by an Object's Property • Sort objects by their color

  12. Sorting by A Property • Sort objects by their shape

  13. Sorting by Color AND Shape Shape “Dimension” Color “Dimension”

  14. Dimensional Analysis • The science of figuring out intuitive ways that people want to categorize information using independent variables to graphically filter and browse their data

  15. Dimension • List of categories used to partition the information based on a property of the objects • Dimension Names: Color, Shape

  16. Labels • A name given to a non-overlapping category within a dimensions Labels "red" "blue" "green"

  17. Enumeration • Whenever we decide to break the continuous observable world into a predefined list of categories when each category has a label we call this an "enumerated value domain". These will then become the "dimensions" of our cube. "green" "red" "blue" Note: NO OVERLAP! Statisticians call this type of "categorical data" and it requires the categories to be non-overlapping.

  18. The Challenge of Semantic Classification • People are good at sorting based on a property they see • People are good at assigning names to a property type • People usually come up with different names for properties • Some dimensions people easily agree on • Some are very difficult to classifyand even more difficult go get peopleto agree on a non-overlapping classification system "Red Circle" "Polygon" "Square" "Blue" "Green" "Blue-Green" What happens with a small percentage of data does not quite fit into a discrete category?

  19. Level • A layer of "aggregation" within a single dimension – categorization of properties All Shapes Shapes With Curves Shapes Without Curves Levels Circle Heart Moon Square Trapezoid Star Diamond

  20. Measures (example weight) 5.7 7.4 3.5 6.6 10 1.1 8.2 6.1 5.5 3.8 9.1 9.3 2.6 8.4 A measure is any property that you can perform math on (sums, averages).

  21. Measures • Something that you can do math on. - X + % / sum average

  22. Sample Object "Fact Table" Measures tend to have data types of integers and floating point numbers. Note that categorical data can not beadded together. But we can count thefrequencies of items with a category!

  23. Shape Dimension Note that there is no reference to "Has Curves" in the prior table. "HasCurves" is a property of the shape value domain because it can be "inferred"from the shape of the object. Some categorical definitions use "exclusionary" language.Note that "Has Curves" also must have a precise definition in the data dictionary.

  24. Facts and Dimension Shape Facts Color Dim Color_FK Color Name Shape_FK Shape Dim Weight Has Curves Shape Name Note that "Has curves" does not need to be in the central fact table.It is a property of the shape!

  25. Adding Dimensions 5.7 7.4 3.5 6.6 10 1.1 8.2 6.1 5.5 3.8 9.1 9.3 2.6 8.4 We have now added a 3rd dimension – "Dash Style"

  26. Each New Property is Another Dimension Shape Facts Color Dim Color_FK Shape Dim Color Code Shape_FK Has Curves DashStyle_FK Shape Code Weight DashStype Dim Shape

  27. Filters 5.7 7.4 3.5 6.6 10 1.1 8.2 6.1 5.5 3.8 9.1 9.3 2.6 8.4 A filter will exclude all objects with a specified property. For example we can exclude all shapes with a property of "Circle"

  28. Example: Discarding Invalid Scores This example filter removes all scores EXCEPT the valid scores.

  29. Selecting Only Scale Scores This filter removes all scores EXCEPT the assessments Scale Scoreusing the Test Score Type dimension.

  30. Dim1 Dim2 Dim4 Dim3 Dim5 PK PK PK PK PK Cat1 Cat1 Cat1 Cat1 Cat1 Facts Primary Key Cat2 Cat2 Cat2 Cat2 Cat2 Foreign Key Cat3 Cat3 Cat3 Cat3 Cat3 Foreign Key Foreign Key Foreign Key Foreign Key Measure1 Measure1 The Star Schema

  31. Adding Measures Shape Facts Color Dim Color_FK ColorCode Shape_FK Shape Dim DashStyle_FK ShapeCode WeightValue DashStype Dim Measures are Integers or floats that you can perform math on. HeightValue ShapeCode PriceAmount DensityValue Measures can be easily be added to the fact table without changing any of the dimensions.

  32. 5.7 7.4 3.5 6.6 10 1.1 8.2 6.1 5.5 3.8 9.1 9.3 2.6 8.4 Cube • A Cube is a pre-built structure that has facts and many dimensions (not necessarily just three) • Designed to have averages and sums for most levels "pre-calculated" to make analysis fast Color Dimension Dash-Style Dimension Shape Dimension

  33. 5.7 7.4 3.5 6.6 10 1.1 8.2 6.1 5.5 3.8 9.1 9.3 2.6 8.4 Build a Mental Model (aka "Page Fields") Filter Funnel Measure = count Vertical Dimension (rows) Presentation Horizontal Dimension (columns)

  34. Using Cubes in Excel Filter Dropped Here Measures Dropped Here Row and Column Dropped Here

  35. Count of Year vs. Assessment Name The measure is the count of records in the cube. The Column Dimension is the "Fiscal Year" There are around 25 million test results The Row Dimension is the "Test Name".

  36. Conformed Dimensions • When building many cubes, there is a large benefit to "reusing" dimensions • Commonly reused dimensions • Time (Fiscal Year, Quarter) • Organization (School, District) • Expense Category • Student

  37. Each bar represents the sum of all the expendituresin the category (Expenditures On girls athletics for the fiscalyear 1991)

  38. Sample of National Conformed Dimensions School Incident Data School Technology Assessment Organization Financial Student Teacher Claims Process Date Student Attendance Student Assessment District Financial Reporting School and District Status Teacher Licensing School Food and Nutrition Student Disciplinary Reporting Student Safety Reporting District Technology Planning

  39. Role of Data Architecture • Facilitate how business users want to identify and categorize data • Assist in the creation and documentation of categorical value domains and measures • Creation of machine-readable data dictionaries for use in building data warehouse structures

  40. Summary • We found a way for non-SQL programmers to analyze complex data by looking at one dimension at a time • Users don't have to memorize "codes" • Users do need to understand how continuous data is mapped into categories and what the labels on these categories mean

More Related