1 / 63

Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases

Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases. Chris Stolte Computer Science Department Stanford University. Motivation. Large relational databases have become very common Corporate data warehouses Amazon, Walmart,… Scientific projects:

nixonc
Télécharger la présentation

Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PolarisQuery, Analysis, and Visualization of Large Hierarchical Relational Databases Chris Stolte Computer Science Department Stanford University

  2. Motivation • Large relational databases have become very common • Corporate data warehouses • Amazon, Walmart,… • Scientific projects: • Human Genome Project • Sloan Digital Sky Survey • Need tools to extract meaning from these databases • Programmatic data mining/statistical analysis • Visual exploration and analysis

  3. Related Work • Formalisms for graphics • Bertin’s “Semiology of Graphics” • Mackinlay’s APT • Roth et al.’s Sage and SageBrush • Wilkinson’s “Grammar of Graphics” • Visual exploration of databases • DeVise • DataSplash/Tioga-2 • Visualization and data mining • SGI’s MineSet • IBM’s Diamond

  4. Outline • Review of Data Warehouses & Data Cubes • A Visualization Formalism • Polaris: Visual Data Mining • Multiscale Visualization

  5. Review of Data Warehouses and Data Cubes

  6. Review: Data Warehouses Data warehouse stores data for analysis • measures (facts) categorized by dimensions: Fact table State Month Product Name Profit Sales Payroll Marketing Inventory Margin ... Nominal / Ordinal fields (categorical dimensions) Coffee chain (courtesy Visual Insights) Quantitative fields (measures)

  7. Hierarchies Data warehouses are very large—need to summarize • Add hierarchical structure to warehouse Dimension tables Time Year Quarter Month Fact table Location Market State State Month Product Name Profit Sales Payroll Marketing Inventory Margin ... Products Product Type Product Name

  8. Each dimension table describes a tree • Each level in tree describes a level-of-detail • Meaningful basis for aggregation • Derived from semantics or programmatic analysis • Context for data Hierarchical Dimensions Time Year Quarter Month

  9. Data Cube For each level-of-detail, summarize relations as cubes • More efficient, powerful model for analysis Each cell aggregatesall measures for those dimensions Each cube axis corresponds to a dimension in the relation at a level-of-detail

  10. Hierarchies & Data Cubes Hierarchies define a lattice of cubes: Least detailed Each cube is defined by a level-of-detail in each dimension. Data abstraction Most detailed

  11. Projecting Data Cubes Can further abstract a cube by “projection”: Data abstraction

  12. Data Warehouse Summary • Industry standard for storing analytic data • Not operational or transactional data • Structured as a lattice of data cubes (aggregations) • Provide summaries of data at meaningful levels of detail • To perform data abstraction: • Choose a cube in the lattice of cubes • Project to relevant dimensions • Where a lot of important data is stored

  13. A Visualization Formalism

  14. A Visualization Formalism • Typical approach: • Monolithic objects defining a single “visual metaphor” • Formalism: • Defines a space of visualization and unifies tables, different graphs as a class of visual representation • Succinct specification of sophisticated visualizations • Can be compiled into necessary drawing operations and database queries • Exposes structure and pattern of effective visual metaphors • Powerful tool for describing, comparing, and building visualizations

  15. Polaris Formalism • Visualizations described using visual specifications that define: • Table configuration for visualization (algebra) • Type of graphic in each pane • Encoding of data as visual properties (color, size, shape, …) of marks • Data transformations and queries • Interpreter compiles a specification into drawing commands and database queries

  16. Polaris Algebra: Operands • Ordinal fields: interpret domain as a set that partitions table into rows and columns: • Quarter = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}  • Quantitative fields: treat domain as single element set and encode spatially as axes: • Profit = {(Profit)} 

  17. Concatenation (+) Operator • Ordered union of two sets • Quarter + ProductType • = {(Qtr1),(Qtr2),(Qtr3),(Qtr4)}+{(Coffee),(Espresso)} • = {(Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espresso)} • Profit + Sales • = {(Profit),(Sales)}

  18. Cross () Operator • Direct-product of two sets • Quarter  ProductType = • {(Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee), (Qtr2, Tea), • (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee), (Qtr4,Tea)} • ProductType  Profit =

  19. Categorical Hierarchies • Quarter  Month • Direct product of two sets • Would create twelve entries for each quarter, i.e. (Qtr1, December) • Quarter / Month • Based on tuples in fact table not semantics • Would only create three entries per quarter • Can be expensive to compute • Quarter . Month • Based on tuples in dimension tables

  20. Encoding System <color encoding><measure name=‘Profit’></color encoding>

  21. SQL Dataflow • Notes • Aggregation operators applied after sort • Only one layer is shown; additional z-sort Sort Query Results Tuples in Panes Marks in Panes

  22. Polaris: Visual Data Mining

  23. The Pivot Table Interface • Common interface to statistical packages/Excel • Cross-tabulations • Simple interface based on drag-and-drop

  24. Extending the Pivot Table Interface • Extend the interface by • Generating rich table-based graphical displays rather than tables of text • Providing a single conceptual model for both graphs and tables • Preserving the ability to rapidly construct displays

  25. Polaris Design Goals Design guided by two primary goals: • Interactive analysis and exploration versus static visualization • Simple, consistent interface

  26. Analysis & Exploration Challenges • Designing a user interface for analysis and exploration places several requirements on the user interface: • Data dense displays:display both many tuples & many dimensions • Multiple display types:different displays suited to different tasks • Exploratory interfaces:rapidly change data transformations and views

  27. Simple, Consistent Interface • Excel Pivot tables provide a simple interface for building text-based tables • Graphs require multiple steps: different interfaces and conceptual models • Want to unify tables, graphs, and database queries in one interface

  28. Polaris Demo!

  29. Data Mining and Visualization • Polaris not solely for visual analysis • Precursor to algorithmic analysis • Validate results and establish trust • Incorporate decision trees and classification algorithms into data warehouses as hierarchies

  30. Multiscale Visualization

  31. Multiscale Visualization • Directly support analysis process • “Overview first, zoom and filter, then details-on-demand” • Visual representation changes as user pans and zooms • Overview, lots of data  data highly abstracted • Zoom, data density decreases  more detailed information shown • Visual and data abstraction • Visual abstraction different representation/same data • Data abstractiontransformations to reduce data set size

  32. Existing Multiscale Visualizations • Cartography • Multiscale information visualization • Pad++: alternate desktops • DataSplash • XmdvTool • ADVIZOR • Main limitations: • One zoom path • Primarily visual abstraction

  33. Contributions • Multiscale visualization with both visual and data abstraction using generalized mechanisms: • Data Abstraction  Data Cubes • Visual Abstraction  Polaris • “Design Patterns”

  34. Path of Exploration • Can think of an analysis as path of specifications

  35. Path of Exploration Visual abstraction

  36. Path of Exploration This is a multiscale visualization! Dataabstraction

  37. Graphical Notation

  38. Graphical Notation: Templates Instance Template

  39. Specifying Multiscale Visualizations • Specify multiscale visualization using a graph of Polaris specifications • “zoom graphs” • Infovis 2002 paper* describes how to implement Polaris Specification Zooming Possible zoom

  40. Specifying Multiscale Visualizations • Can specify a zooming pattern by using templates

  41. Specifying Multiscale Visualizations • Independent zooming on different dimensions is described as a graph y-axis zoom x-axis zoom

  42. Design Patterns

  43. Design Patterns • Zoom graphs simplify specifying and implementing multiscale visualizations • Design is still very hard • “Design patterns” (a la Gamma et al.) • Capture zoom structures that have been used effectively & reuse in new designs • We present four such patterns • Formal way to discuss multiscale visualization

  44. Thematic Maps

  45. Thematic Maps

More Related