1 / 25

Envisioning Information Lecture 3 – Multivariate Data Exploration

Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates. Ken Brodlie. Multivariate datasets can be expressed as a data table Each entry in table is an observation An observation consists of values of a set of variables, or variates

teal
Télécharger la présentation

Envisioning Information Lecture 3 – Multivariate Data Exploration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie ENV 2006

  2. Multivariate datasets can be expressed as a data table Each entry in table is an observation An observation consists of values of a set of variables, or variates Exercise Create a data table from the MSc class… variables observations Data Tables ENV 2006

  3. For two variates, we have already met the scatter plot technique It is useful for showing what happens to one variable as another changes… Scatter Plot ENV 2006

  4. Visicube from Datamology is a useful free charting tool Here is an example scatter plot, visualizing the speed of the (receding) galaxy NGC7531 relative to the earth, measurements of speed being taken at different points on galaxy Circles represent measurements at 133o to horizon; pluses at 43o What can you observe? Scatter Plot http://www.datamology.com/sample-S2.shtml ENV 2006

  5. Visicube has a tool specifically for 3D scatter plots Third variate expressed as a vertical axis and widget lets you take slices at different heights Here we have same dataset but X and Y are positions, and Z axis is velocity … ie layered by velocity – here 3rd layer (1482 – 1519 km/sec) Observations less than 1500 km/sec highlighted in yellow (almost allowing 4D) Conclusion? 3D Scatter Plot http://www.datamology.com/sample-S3.shtml ENV 2006

  6. Here is an alternative approach, using 3D plotting… … does this work? 3D Scatter Plots XRT/3d http://www.ist.co.uk/XRT/xrt3d.html ENV 2006

  7. Additional variables can be visualized by colour and shape coding IRIS Explorer ( a scientific visualization system!) used to visualize data from BMW Five variables displayed using spatial arrangement for three, colour and object type for others Notice the clusters… But there are clearly limits to how much this will scale Extending to Higher Numbers of Variables Kraus & Ertl, U Stuttgart http://wscg.zcu.cz/wscg2001/Papers_2001/R54.pdf ENV 2006

  8. Software: Xmdvtool Matthew Ward Multivariate Visualization Techniques • Techniques designed for any number of variables • Scatter plot matrices • Parallel co-ordinates • Glyph techniques http://davis.wpi.edu/~xmdv Acknowledgement: Many of images in following slides taken from Ward’s work ENV 2006

  9. What are these? ENV 2006

  10. Example of iris data set 150 observations of 4 variables (length, width of petal and sepal) Check wikipedia for explanations of petals & sepals Techniques aim to display relationships between variables – the analytical task Multivariate Visualization Challenge in visualization is to design the visualization to match the analytical task ENV 2006

  11. Scatter Plot Matrices ENV 2006

  12. . . . . . . . . . Scatter Plot Matrices • For table data of M variables, we can look at pairs in 2D scatter plots • The pairs can be juxtaposed: . . C With luck, you may spot correlations between pairs as linear structures… or you may observe clusters . . . B . A . . . B A C ENV 2006

  13. Scatter Plot Matrix – Iris Data Set ENV 2006

  14. Scatter Plot Matrix – Car Data Set Data represents 7 aspects of cars: what relationships can we notice? For example, what correlates with high MPG? ENV 2006

  15. A B C D E F Parallel Coordinates - create M equidistant vertical axes, each corresponding to a variable - each axis scaled to [min, max] range of the variable - each observation corresponds to a line drawn through point on each axis corresponding to value of the variable ENV 2006

  16. A B C D E F Parallel Coordinates • correlations may start to appear as the observations are plotted on the chart • - here there appears to be negative correlation between values of A and B for example • - this has been used for applications with thousands of data items ENV 2006

  17. Parallel Coordinates – Iris Data ENV 2006

  18. Parallel Coordinates Example Detroit homicide data 7 variables 13 observations 1961 -1973 ENV 2006

  19. Concept due to Alfred Inselberg Conceived the idea as a research student in 1959… … idea gradually refined over next 40 years Parallel Coordinates http://www.math.tau.ac.il/~aiisreal/ ENV 2006

  20. Parallel coordinates is a clever mechanism for transforming geometry from one space to another To get a handle on the idea, consider two variables X,Y In parallel coordinates, a point (X,Y) becomes… what? A line becomes… what? Why is the ordering of the axes important? Use this space to sketch the answers… Parallel Coordinates ENV 2006

  21. All techniques, sooner or later, run out of screen space Parallel co-ordinates Usable for up to 150 variates Unworkable greater than 250 variates Remote sensing: 5 variates, 16,384 observations) The Screen Space Problem ENV 2006

  22. Brushing selects a restricted range of one or more variables Selection then highlighted Brushing as a Solution ENV 2006

  23. Scatter Plot Use of a ‘brushing’ tool can highlight subsets of data ..now we can see what correlates with high MPG ENV 2006

  24. Parallel Coordinates Brushing picks out the high MPG data Can you observe the same relations as with scatter plots? More or less easy? ENV 2006

  25. Parallel Coordinates Here we highlight high MPG and not 4 cylinders ENV 2006

More Related