1 / 34

New Directions in Analysis and Visualization

[ Visual Analytics ]. New Directions in Analysis and Visualization. Dr Jeremy Walton NAG Ltd, Oxford jeremy.walton@nag.co.uk. Overview. Introduction NAG, HECToR Visualization distribution, collaboration, steering Data mining classification, exploratory analysis The ADVISE project

cliff
Télécharger la présentation

New Directions in Analysis and Visualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. [Visual Analytics] New Directions in Analysis and Visualization Dr Jeremy Walton NAG Ltd, Oxford jeremy.walton@nag.co.uk Research Methods Festival, St Catherine's College, Oxford

  2. Overview • Introduction • NAG, HECToR • Visualization • distribution, collaboration, steering • Data mining • classification, exploratory analysis • The ADVISE project • large data, interactive analysis Research Methods Festival, St Catherine's College, Oxford

  3. Overview • Introduction • NAG, HECToR • Visualization • distribution, collaboration, steering • Data mining • classification, exploratory analysis • The ADVISE project • large data, interactive analysis Research Methods Festival, St Catherine's College, Oxford

  4. NAG profile • Products • Mathematical, statistical, data analysis components • 3D visualization, compilers & tools • HPC software engineering services • HECToR support • Users • Academic researchers • Professional developers • Analysts / modelers • Founded 1976 • Not-for-profit company Research Methods Festival, St Catherine's College, Oxford

  5. High-End Computing Terascale Resource • Latest high-end computing service for UK • funded by EPSRC, NERC & BBSRC • will run from 2007-2013 • Partners: • Hardware: Cray Inc • Service Provision: University of Edinburgh HPCx Ltd • hardware hosting, user services, help desk • CSE Support: NAG Ltd • technical assessment of project application • porting / tuning / optimisation of user codes • training courses (inc. visualization) • best practice guides, documentation, FAQs Research Methods Festival, St Catherine's College, Oxford

  6. Overview • Introduction • NAG, HECToR • Visualization • distribution, collaboration, steering • Data mining • classification, exploratory analysis • The ADVISE project • large data, interactive analysis Research Methods Festival, St Catherine's College, Oxford

  7. Visualization toolkits • Help construct visualization applications • no wheel-reinvention, stone canoes, chocolate teapots • Proprietary supported commercial systems • e.g. Excel, IRIS Explorer, Spotfire • Open source, freely available software • e.g. OpenDX, InfoVis Research Methods Festival, St Catherine's College, Oxford

  8. NAG’s IRIS Explorer… • General purpose toolkit for data visualization • Reusable building blocks (modules) • Connect modules to build application • Point-and-click development • Visual programming approach • Build, execute, reshape • Add new modules, if required Research Methods Festival, St Catherine's College, Oxford

  9. Application in map editor Modules in module librarian Reads data Colormaps it Makes ribbon Displays it …in action Research Methods Festival, St Catherine's College, Oxford

  10. Make the connections Research Methods Festival, St Catherine's College, Oxford

  11. Adds axes Add more modules... Research Methods Festival, St Catherine's College, Oxford

  12. Addscaption ...and even more Research Methods Festival, St Catherine's College, Oxford

  13. Some examples Research Methods Festival, St Catherine's College, Oxford

  14. Trendalyzer (Gapminder) Research Methods Festival, St Catherine's College, Oxford

  15. Worldmapper: area Research Methods Festival, St Catherine's College, Oxford

  16. Worldmapper: deaths by disease Research Methods Festival, St Catherine's College, Oxford

  17. Many eyes: shared visualization Research Methods Festival, St Catherine's College, Oxford

  18. Overview • Introduction • NAG, HECToR • Visualization • distribution, collaboration, steering • Data mining • classification, exploratory analysis • The ADVISE project • large data, interactive analysis Research Methods Festival, St Catherine's College, Oxford

  19. NAG Data Mining Tools • Data Cleaning • Data imputation - adding missing values • Outlier detection - finding suspect data records • Data Transformation • Scaling Data - before distance computation • Principal Component Analysis - reducing # of variables • Model fitting • Cluster analysis - finding interesting groups • Classification techniques - # of groups is known • Regression no groups - outcome is continuous • Linear / Non-linear / Time series Research Methods Festival, St Catherine's College, Oxford

  20. Example: exploratory data analysis • How many species of water vole (Arvicola) in UK? • Measurement data • Presence / absence of 13 skull characteristics • 300 observations, each in one of 14 regions • 3 groups: • A. terrestris / A. sapidus / unclassified UK cases • Treatment • Average data within each region • Gives 14 data points in 13 dimensions • How to display dataset? Research Methods Festival, St Catherine's College, Oxford

  21. 2D scatterplots

  22. Analysis • 2D scatterplots? • Structure is unclear • (13 x 12) / 2 = 78 plots needed • Principal components analysis? • 2 PCs explain 49% of the variance • 3 PCs explain 65% of the variance • Should be > 85% for confident representation • Fisher’s iris dataset (4 variables) is 95% • Alternative technique • Metric scaling Research Methods Festival, St Catherine's College, Oxford

  23. Metric scaling • 14 data points – one for each region • Each point has values for 13 variables • Construct 14 by 14 dissimilarity matrix, Δ • Δij = distance between points i & j in 13D space • Δ is symmetric, with zero diagonal elements • Want to find a new matrix, Δ* • set of 14 new data points in 3D space that preserve Δ • Project Δ to Δ* using metric scaling • Display data points in 3D Research Methods Festival, St Catherine's College, Oxford

  24. Exploratory data analysis conclusions • 2D scatterplots don’t indicate group structure • cf. iris dataset • 3D PCA unreliable here • Metric scaling of Δ used to reduce D from 13 to 3 • 3D visualization reveals group structure • Distinct A. sapidus group • UK sample represents only A. terrestris Research Methods Festival, St Catherine's College, Oxford

  25. Overview • Introduction • NAG, HECToR • Visualization • distribution, collaboration, steering • Data mining • classification, exploratory analysis • The ADVISE project • large data, interactive analysis Research Methods Festival, St Catherine's College, Oxford

  26. The ADVISE project • DTI-funded research project, started March 2007 • NAG / VSN / University of Leeds • Merge visualization & statistics (visual analytics) • use statistics to identify key characteristics of dataset • understand the characteristics through visualization • User community • pharmaceuticals • environmental science • engineering • Initial user meeting held September 2007 Research Methods Festival, St Catherine's College, Oxford

  27. Large datasets • Size matters (but isn’t everything) • Developer’s view:Too large for our current system • Problems of • performance • robustness • User’s view:Too large for me to understand • Current ADVISE datasets are “only” a few GB • complications (e.g comparing several) could raise this • HECToR users have TB datasets Research Methods Festival, St Catherine's College, Oxford

  28. ADVISE ideas • Retention of visual programming interface • Re-use of algorithmic base • IRIS Explorer modules • GenStat statistics functionality (from VSN) • Three layered architecture • User interface • Web service middleware • Visualization components • Distribution, tailored user interface, collaboration Research Methods Festival, St Catherine's College, Oxford

  29. ADVISE progress • Porting IE modules to standalone environment • some of these use GenStat for statistics • New system used to revisit air quality demo • early (IEEE Viz 96) web-based visualization • new system more efficient • Working with real user data Research Methods Festival, St Catherine's College, Oxford

  30. Conclusions • NAG offers software components for developers • no wheel-reinvention, stone canoes, chocolate teapots • Visualization & data mining crucial for analysis • distribution, steering, classification, exploration • interactivity / interrogation important • integration is an ongoing field of activity • ADVISE project • developing a new system for visual analysis • working with real user problems • improving understanding of data Research Methods Festival, St Catherine's College, Oxford

More Related