1.45k likes | 1.66k Vues
Visual Analytics Review. IAT 355 Lyn Bartram. Overview. Topics ( in no particular order) Data models and analytics Information visualization techniques: Types and components Interaction Perception Cognition Navigation and Scent Presentation and screen space. Overview and definitions.
E N D
Visual Analytics Review IAT 355 Lyn Bartram
Overview • Topics ( in no particular order) • Data models and analytics • Information visualization techniques: Types and components • Interaction • Perception • Cognition • Navigation and Scent • Presentation and screen space IAT 355 Introduction
Overview and definitions IAT 355 Introduction
Information visualization • visual metaphors for non-inherently spatial data such as the exploration of text-based document databases. • More abstract • Assign structure and position to information that has none • Text • Statistics • Finance/Business • Internet • Software IAT 355 Introduction
Visual analytics • analytical reasoning supported by the interactive visual interface • Intersection of visualization with data analysis • Biology • National security IAT 355 Introduction
Visual thinking Visual thinking involves: Constructing visual queries on displays Visual search strategies through eye movements and attention to relevant patterns Visual notification and attention “redirection” to new patterns and events Well structured balance of elements and tasks IAT 355 Introduction
Data Analytics IAT 355 Introduction
Data We Use • Data Models • Types • Metadata • Aggregates • Descriptive Statistics • Distribution • Clusters Show Me the Numbers! : Data
Data models • take raw data and transform it into a form that is more workable • Main idea: build a model • Individual items are called cases or records • Cases have attributes : an attribute is a value of a variable or factor • In vis terms, a dimension
How many dimensions? • Data sets of dimensions 1, 2, 3 are common • Number of variables per class • 1 - Univariate data • 2 - Bivariate data • 3 - Trivariate data • >3 - Hypervariatedata • These are the fun and interesting ones! But hard! Show Me the Numbers! : Data
Data Types (measurements) • Nominal: categorical,( equal or not equal to other values) • Example: gender, Student Number • No concept of relative relation other than inclusion in the set • Ordinal : sequential ( obeys < > relation, ordered set • Example: Size of car, speed settings on road • Example: mild, medium, hot, suicide • Distance is not uniform Show Me the Numbers! : Data
Data Types 2 • Interval : Relative measurements, no fixed zero point. • Data is numerical, not categorical. Rank order among variables is explicit with an equal distance between points in the data set: -2, -1, 0, +1, +2 • can say “twice as much as” • Example: height above sea level, hours in a day • Ratio: Interval data with absolute zero • Example: account balance, degrees Kelvin Show Me the Numbers! : Data
Dimensions • Data Dimensions are classified as: • Quantitative i.e. numerical • Continuous (e.g. pH of a sample, patient cholesterol levels) • Discrete (e.g. number of bacteria colonies in a culture) • Categorical • Nominal (e.g. gender, blood group) • Ordinal (ranked e.g. mild, moderate or severe illness). Often ordinal variables are re-coded to be quantitative.
Metadata • Descriptive information about the data • Might be something as simple as the type of a variable, or could be more complex • For times when the table itself just isn’t enough • Example: if variable1 is “l”, then variable3 can only be 3, 7 or 16 • Missing values, uncertainty or importance are all examples of metadata Show Me the Numbers! : Data
Primary types of data analysis Qualitative Descriptive. Used to describe the distribution of a single variable or the relationship between two nominal variables (mean, frequencies, cross-tabulation) Inferential (Used to establish relationships among variables; assumes random sampling and a normal distribution) Nonparametric (Used to establish causation for small samples or data sets that are not normally distributed) Show Me the Numbers! : Data
Descriptive Statistics • Range • Min/Max • Average • Median • Mode Distribution Statistics • Variance • Error • Standard Deviation • Histograms and Normal Distributions Show Me the Numbers! : Data
Range, Min, Max • The Range • Difference between minimum and maximum values in a data set • Larger range usually (but not always) indicates a large spread or deviation in the values of the data set. (73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100)
Average = measure of centrality Measures of location indicate where on the number line the data are to be found. Common measures of location are: (i) the Arithmetic Mean, (ii) the Median, and (iii) the Mode
The data may or may not be symmetrical around its average value 0 2.5 7.5 10 4.8 0 2.5 7.5 10 4.8 The mean is vulnerable to problems
The Median The middle value in a sorted data set. Half the values are greater and half are less than the median. Another measure of central location in the data set. (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) Median: 68 (1, 2, 4, 7, 8, 9, 9)
0 2.5 7.5 10 6.25 • The Median • May or may not be close to the mean. • Combination of mean and median are used to define the skewness of a distribution. Show Me the Numbers! : Data
The Mode • The Mode • The most frequent occurring value. • Another measure of central location in the data set. • (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) • Mode: 74 • Generally not all that meaningful unless a larger percentage of the values are the same number Show Me the Numbers! : Data
When do we use what? • Dependent on how the data are distributed • Note if mean=median=mode then the data are said to be symmetrical • Rule of thumb: • use mean if data are normally distributed and variance is within constraints • Use median to reduce effects of outliers Show Me the Numbers! : Data
Summary http://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-median.php Show Me the Numbers! : Data
Data distribution • Measures of dispersion characterise how spread out the distribution is, i.e., how variable the data are. • Commonly used measures of dispersion include: • Range • Variance & Standard deviation • Coefficient of Variation (or relative standard deviation) • Inter-quartile range Show Me the Numbers! : Data
Measures of variance • Variance • One measure of dispersion (deviation from the mean) of a data set. The larger the variance, the greater is the average deviation of each datum from the average value • Standard Deviation • the average deviation from the mean of a data set. • An outlier is an datum which does not appear to belong with the other data Show Me the Numbers! : Data
Inter-quartile range • The Median divides a distribution into two halves. • The first and third quartiles (denoted Q1 and Q3) are defined as follows: • 25% of the data lie below Q1 (and 75% is above Q1), • 25% of the data lie above Q3 (and 75% is below Q3) • The inter-quartile range (IQR) is the difference between the first and third quartiles, i.e. IQR = Q3- Q1
Box-plots • A box-plot is a visual description of the distribution based on • Minimum • Q1 • Median • Q3 • Maximum • If a data point is < lower limit or > upper limit, the data point is considered to be an outlier. • Useful for comparing large sets of data
Distribution is important for Aggregation • Visualization helps us see relations – or the trends of them - as visual patterns • a lot of what we visualize are the descriptive statistics • Example: mean income vs median income • Need to ensure that the univariate units of visualization are legit • Rule: check your core units /variables. If hey are descriptive, look at the distribution Show Me the Numbers! : Data
Example: job losses in US over time Show Me the Numbers! : Data
Example: job losses in US over time Show Me the Numbers! : Data
2D Visualization Classes IAT 355 Introduction
Graphs Charts Maps Diagrams Types of Symbolic Displays (Kosslyn 89)
Types of Symbolic Displays • Graphs • at least two scales required • values associated by a symmetric “paired with” relation • Examples: scatter-plot, bar-chart, layer-graph
Graphs • Encode quantitative information using position and magnitude of geometric objects. • Examples: scatter plots, bar charts.
Types of Symbolic Displays • Charts • discrete relations among discrete entities • structure relates entities to one another • lines and relative position serve as links • Examples: • Family tree • Flow chart • Network diagram
Map • Internal relations determined (in part) by the spatial relations of what is pictured • Grid: geometric metadata • Locations identified by labels • Nominal metadata • Examples: • Map of census data • Topographic maps IAT 355
Choropleth Map • Areas are filled and colored differently to indicate some attribute of that region IAT 355
Diagrams • Schematic pictures of objects or entities • Parts are symbolic (unlike photographs) • how-to illustrations • figures in a manual From Glietman, Henry. Psychology. W.W. Norton and Company, Inc. New York, 1995
Graph Components • Framework (spatial substrate) • Measurement types, scale • Geometric Metadata • Content • Marks, lines, points • Data • Labels • Title, axes, ticks • Nominal Metadata IAT 355
Marks • Things that occur in space • Points • Lines • Areas • Volumes IAT 355
Graphical Properties • Size, shape, color, orientation... IAT 355
What goes where • In univariate representations, we often think of the data case as being shown along one dimension, and the value (quantity) in another Y Axis is quantitative Graph shows change in Y over continuous range X Y Axis is quantitative Graph shows value of Y for 4 cases IAT 355
Bivariate Data Price • Representations • Scatter plot • Each mark is a data case • Want to see relationship between two variables • What is the pattern? • Note both variables are continuous data Mileage IAT 355
Multivariate: Project data onto other graphical variables • E.G., Use blob attribute for another variable Price Price Mileage Mileage IAT 355
Alternative • Represent each variable on its own line Small multiples IAT 355
Data projection • Fundamentally, we have 2 display dimensions • For data sets with >2 variables, we must project data down to 2D • Come up with visual mapping that locates each dimension into 2D plane • Computer graphics 3D->2D projections IAT 355: Mutivariate Data
What is Multivariate Data? • Each data point has N variables or observations • Each observation can be: • nominal or ordinal • discrete or continuous • scalar, vector, or tensor • May or may not have spatial, temporal, or other connectivity attribute This slide courtesy of Matt Ward, UC Berkeley
Methods for Visualizing Multivariate Data Dimensional Subsetting Dimensional Reorganization dimensional re-ordering Dimensional Embedding Dimensional Reduction This slide courtesy of Matt Ward, UC Berkeley