1 / 16

Dr. Gari D. Clifford, University Lecturer & Associate Director,

Information Driven Healthcare: Data Visualization & Classification Lecture 2: Visualization Centre for Doctoral Training in Healthcare Innovation. Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation,

edison
Télécharger la présentation

Dr. Gari D. Clifford, University Lecturer & Associate Director,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Driven Healthcare:Data Visualization & Classification Lecture 2: Visualization Centre for Doctoral Training in Healthcare Innovation Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering, University of Oxford

  2. What is visualization? • To communicate information clearly and effectively (through graphical means) • Generally a projection down to two or three dimensions (if you don’t count colour or time) www.peltarion.com

  3. Why visualize? • To discover relationships between parameters • To test out pre-filtering data transforms SOM of 39 indicators of quality of life collected by the world bank (1992) – e.g. state of health, nutrition, educational services, etc. Proximity indicates similar levels of many attributes. http://www.cis.hut.fi/research/som-research/worldmap.html

  4. Methods of visualization Unsupervised machine learning: • K-means clustering • Kohonen’s Self Organizing Map (SOM) • Generative Topographic Maps (GTM) • Neuroscale

  5. K-means clustering • Suppose you have some 2D data with this relationship? • How do you ‘discover’ the underlying unknown classes? • You need an unsupervised learning technique …. • e.g. K-means

  6. K-means clustering Guess a number (k) of clusters (e.g. k=5) Randomly guess the k cluster centre locations (obviously bad!)

  7. K-means clustering • Guess a number (k) of clusters (e.g. k=5) • Randomly guess the k cluster centre locations • Associate each datapoint with closets centre

  8. K-means clustering • Guess a number (k) of clusters (e.g. k=5) • Randomly guess the k cluster centre locations • Associate each datapoint with closets centre • Each centre finds the points it ‘owns’

  9. K-means clustering • Guess a number (k) of clusters (e.g. k=5) • Randomly guess the k cluster centre locations • Associate each datapoint with closets centre • Each centre finds the points it ‘owns’ • …and jumps there • Repeat until terminated

  10. So how do we assess membership of a cluster? • Metric: • A measure of distance between two points … e.g. • Euclidean: (x2 + y2+ z2 + …)1/2 • City Block/ Mahalanobis : Sum of absolute differences, (L1 norm or distance) • Cosine: 1-cos(a); where a is the angle between each point (treated as vectors) • Lp norms: • Cost function: • Some mathematical operation performed on the metric … e.g. • Square • Norm • Log • …

  11. What affects the performance? • The metric + cost function • The distribution/ separability of the data • The dimensionality of the data • The number of iterations / stopping criterion (more on this later)

  12. Other methods • …. Not to be used in lab (unless you are really ambitious and quick)!

  13. Self-Organizing Map • Kohonen or SOM – A Self-Organizing Map or self-organizing feature map (SOFM) • Type of ANN with unsupervised training to produce a low-dimensional (typically 2D), discretised representation of the input space of the training samples, called a map. • SOMs are different from other ANNs because they use a neighborhood function to preserve the topological properties of the input space, rather than target classes. • Training an SOM involves builds the map using input examples using a technique called ‘vector quantization’.

  14. Generative topographic map (GTM) • GTM was introduced in 1996 in a paper by Bishop, Svensen, and Williams • GTM is a probabilistic counterpart of the SOM • It is provably convergent and does not require a shrinking neighborhood or a decreasing step size • It is a generative model: the data is assumed to arise by first probabilistically picking a point in a low-dimensional space, mapping the point to the observed high-dimensional input space (via a smooth function), then adding noise in that space. • The parameters of the low-dimensional probability distribution, the smooth map and the noise are all learned from the training data using the expectation-maximization (EM) algorithm. (See later) • GTM explicitly requires a smooth and continuous mapping from the input space to the map space - therefore it is topology preserving

  15. Sammon mapping • Sammon's projection, or Sammon's mapping – an algorithm that maps a high-dimensional space to a space of lower dimensionality • Denote the distance between ithand jth objects in the original space by , and the distance between their projections by . • Sammon's projection aims to preserve distances in the projected space by minimising Sammon's stress metric: • The minimisation can be performed by gradient descent, or other optimization algorithms – see Wednesday’s lecture.

  16. Neuroscale • Neuroscale is a topographic projection that uses Sammon’s stress metric • … and Radial Basis Functions (RBFs) – a simple single layer ANN • See Ch 7.4 in Nabney’s Netlab.

More Related