1 / 74

Data Visualization Seminar NCDC, April 27 2011

Data Visualization Seminar NCDC, April 27 2011. Todd Pierce Module 5 Types of Graphs. Best Practices Time Series (sources: Colin Ware and Stephen Kosslyn). Time Series Graphs. Most graphics show values changing over time – time gives us a context for understanding data

quade
Télécharger la présentation

Data Visualization Seminar NCDC, April 27 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Visualization SeminarNCDC, April 27 2011 Todd Pierce Module 5 Types of Graphs

  2. Best PracticesTime Series(sources: Colin Ware and Stephen Kosslyn)

  3. Time Series Graphs • Most graphics show values changing over time – time gives us a context for understanding data • random sample of 4000 newspaper graphics 1874-1989 found 75% of them had time series • Time Series can be shown best by line graphs but sometimes other graphs work best

  4. Time Series Graphs • Patterns • Trend: overall tendency of values to increase, decrease, or stay stable during a time period; trend lines can show this (but see later caveats) • Variability: average degree of change from one point in time to the next in a time period; but be careful, if the y scale is narrow or does not start at zero, variability may be overstated • Rate of change: percent difference between one value and the next; rates of change may be increasing faster than the raw data values would indicate

  5. Time Series Graphs • Patterns • Co-variation: changes in one time series are reflected as changes in another, either immediately or later; changes can be in same or different directions; if changes are not immediate, we have leading or lagging indicators • Cycles: patterns that repeat at regular intervals instead of in one fixed interval • Exceptions: values that fall far outside the norm

  6. Time Series Graphs

  7. Time Series Graphs • Line Graphs: show how quantitative values have changed over a continuous time period; show pattern or shape of change over time; show exceptions • Lines make visible the sequential flow of values over time • Lines trace connection from one value to the next • Lines shows extent and direction of change through slope • If we want to compare magnitudes of values at a point in time, we should add dots to the lines

  8. Time Series Graphs • Bar Graphs: emphasize individual values and allow for comparisons of specific values at points in time • Visual weight of bars and their separation makes us focus on individual values rather than the overall patterns • Dot Plots: useful when sampling at irregular intervals • A line connecting sporadic values implies smooth transitions between values • More regular sampling might show different picture • Use dots instead of lines to avoid false conclusions

  9. Time Series Graphs • Box Plots: show distribution of values over time by showing the average, min and max • see Distribution Analysis for more information • Animated Scatterplots : show correlation analysis over time – such as Gapminder • see Correlation Analysis for more information • Great for telling a story, not so good for analysis – hard to track individual dots • Must be combined with trails to show patterns of change over time, and small multiples (trellis display) to compare patterns of changes for multiple items

  10. Time Series Graphs • Best Practices • Aggregating to different time intervals: combine data into different time spans (month, week, year, day) to see different patterns emerge • Viewing time periods in context: extend the time period – trends that look significant in a small time span may not be over longer periods • Grouping related time intervals: add vertical lines or shading on the time axis to show for example each quarter or when the weekends are

  11. Time Series Graphs • Best Practices • Using running averages to enhance perception of high level patterns: trend lines can mislead if they don’t take into account values just outside the time period; better to look at running averages of current value and a few previous values – this smoothing can reduce variability that throws off trend lines • Omitting missing values from a display: rather than have the line dip to zero, either skip the value (show a broken line) or show the line lighter or dashed; do not confuse a valid zero value with a missing value

  12. Time Series Graphs • Best Practices • Optimizing a graph’s aspect ratio: change the aspect ratio to get a lumpy profile instead of a flat or spiky profile, to allow for optimal comparison of slopes • Using log scales and percentages to compare rates of change: variations in numerical magnitudes may hide true rates of change – use log scales, or percent change from previous value or from a baseline value, to see true rates of change • Overlapping time scales to compare cyclical patterns: instead of showing for example all three years in one line, show each year as a different line over the 12 months, to allow comparisons from year to year for a given month

  13. Time Series Graphs • Best Practices • Using cycle plots to examine trends and cycles together: compare cycles and see trends across multiple cycles • Shifting time to compare leading and lagging indicators: shift the time axis on one graph so it aligns with the other and see patterns • Stacking line graphs to compare multiple values: if multiple time series have very different units or scale ranges, put them in stacked line graphs with the same time axis

  14. Time Series Graphs • Best Practices • Expressing time as 0-100% to compare asynchronous processes: if activities have different start dates, reduce each to 0% and show later dates as percentage of total activity time, to compare values at similar times in total activity length • Maintaining consistency through time: must adjust for inflation in currency over time; and account for how information gathering changed or values were defined over time

  15. Time Series Graphs • Do’s and Don’t’s • Change salience of lines if needed to show relative importance. • Ensure crossing or nearby lines are discriminable. • If using points on lines, make points at least twice as thick as the lines. • Vary the lengths of dashes in dashed lines by at least a ratio of 2 to 1. • Use different, discriminable symbols for points on different lines.

  16. Time Series Graphs • Do’s and Don’t’s • Do not fill in the areas between two lines – it’s not an area graph. • In a mixed line and bar display, make one more salient and important. • Put labels of all lines in same part of graph (else it draws attention to certain lines – also less busy). • Put labels at end of lines (so labels and lines group with each other. • Label any critical data points explicitly rather than labeling all points.

  17. Best PracticesPart-to-Whole and Ranking Analysis(sources: Colin Ware and Stephen Kosslyn)

  18. Part-to-Whole and Ranking • Comparing parts to a whole and ranking them by value – for example the expenses of each department of a company as a % of total expenses, ranked in order

  19. Part-to-Whole and Ranking • Patterns • Uniform – all values roughly the same • Uniformly different – differences from one value to the next increase by roughly the same amount • Non-uniformly different – differences from one value to the next vary significantly

  20. Part-to-Whole and Ranking • Patterns • Increasingly different – differences from one value to the next increase • Decreasingly different – differences from one value to the next decrease • Alternating differences – differences from one value to the next begin small then shift to large and finally back to small • Exceptional – one or more values are very different from the rest

  21. Part-to-Whole and Ranking

  22. Part-to-Whole and Ranking

  23. Part-to-Whole and Ranking • Part to whole is usually shown with pie charts – bad idea! • Makes us compare areas or angles, both of which humans do poorly • If pie uses a legend, eye must bounce between chart and legend • You can label pie wedges directly with name and % value – but this is no better than a table – why use a graph if we must resort to printed values to make sense of it?

  24. Part-to-Whole and Ranking Bad Acceptable

  25. Part-to-Whole and Ranking Bad

  26. Part-to-Whole and Ranking Bad

  27. Part-to-Whole and Ranking Acceptable?

  28. Part-to-Whole and Ranking • Instead, use a bar graph • One exception – if values cluster close together, the bar differences are small and hard to see • So narrow the scale (zoom in) so differences bigger • But, use dot plot – dots or lines instead of bars – so we don’t misjudge the bar lengths

  29. Part-to-Whole and Ranking • Use a Pareto chart to show the cumulative contributions of each part to a whole • a line graph plus a bar chart shows how the parts sum to 100 • summarize and display the relative importance of the differences between groups of data.  Pareto charts • distinguish the "vital few" from the "useful many."

  30. Part-to-Whole and Ranking • Vilfredo Pareto, a turn-of-the-century Italian economist, studied the distributions of wealth, finding that about 20% of people controlled about 80% of a society's wealth. • This same distribution has been observed in other areas and has been termed the Pareto Principle or 80/20 rule.

  31. Part-to-Whole and Ranking

  32. Part-to-Whole and Ranking

  33. Part-to-Whole and Ranking • Best Practices • Grouping categorical values in ad hoc manner: group very small categories into one called ‘other’ or regrouping similar categories into one master category for better analysis • Using Pareto charts with percentile scales: group values into percentile intervals (top 10%, ,next 10%, etc) and use Pareto line – can lead to new insights • Using line graphs to view ranking changes through time: use line graphs to show changes in ranking (such as salesperson’s sales) over time – the lines show the relative ranking but not the actual values – inspired by bump charts from racing

  34. Part-to-Whole and Ranking • Best Practices • Re-expressing values to solve quantitative scaling problems: sometimes the small values on a bar chart are hard to see relative to the large values – so re-express the number using the square root, or a logarithm, if it reduces the range from highest to lowest; can also use an inverse scale (divide each value by the largest value or some other value such as a million)

  35. Part-to-Whole and Ranking • Do’s and Don’t’s: Bar Charts • Do not insist on minimizing ink. • Mark corresponding bars in same color or symbol for multiple parameters. • Arrange corresponding bars in same order for multiple parameters. • Ensure overlapping bars do not look like stacked bars – offset the bars. • Leave space between bar clusters for multiple parameters. • Do not extend bars beyond the end of the scale.

  36. Part-to-Whole and Ranking • Do’s and Don’t’s: Pie Charts • Draw radii from the center of the circle. • Explode a maximum of 25% of the wedges. • Arrange wedges in a simple increasing progression. • Place labels in wedges provided they can be easily read. • Place labels next to all wedges if they cannot fit inside wedges (otherwise reader will think ones outside wedge are more important).

  37. Best Practices Deviation Analysis(sources: Colin Ware and Stephen Kosslyn)

  38. Deviation Analysis • Examining how a set of values deviate from a reference point (a budget, average, or price in time) • Usually use a bar graph with two bars per entity – the actual and expected, such as for a budget • However this makes user subtract values in head • Better to have the graph 0 line be the expected reference, and the bars show the amount over or under (the deviation)

  39. Deviation Analysis • Comparisons • Current target, future target • Same point in time in past • Immediately prior period • Standard or norm • Other items in same category or same market

  40. Deviation Analysis

  41. Deviation Analysis

  42. Deviation Analysis • Best shown as bar or line graphs with reference line at 0 or 100% • If at 0, values expressed as positive and negative deviations in dollars or percents • If at 100%, values expressed as percentages of the reference value • Best to use a line graph when doing comparisons over time, from one period to the next; if comparing entities such as areas or companies, use a bar graph

  43. Deviation Analysis • Best Practices • Expressing deviations as percentages: helps normalize multiple data sets to same units to allow for better comparison – works best if values or mostly <= 100% and nothing exceeds 500% • Comparing deviations to other points of reference: besides showing reference line, show other lines such as acceptable deviations from norm, or standard deviations from mean

  44. Best Practices Distribution Analysis(sources: Colin Ware and Stephen Kosslyn)

  45. Distribution Analysis • Seeing how numerical values are distributed from low to high, and compare how multiple values sets are distributed • “The median isn’t the message” (Stephen Jay Gould) • knowing the average or median value hides the full range of values • even knowing the max and min values hides the number of values at each numerical value in a range of data

  46. Distribution Analysis • Characteristics of distributions of values • Spread: the difference between the max and min values – the full range of values • Center: estimate of the middle of a set of values – the mean or median or average • Shape: where values are located in a spread – skewed to a side? Evenly distributed? • Distribution summaries: • 3 value: low, median, high • 5 value: low, 25th %ile, median, 75th %ile, high

  47. Distribution Analysis • Patterns - Shape: • Curved or flat? • If curved, curved upward (bell curve) or downward (opposite of bell curve)? • If curved upward, one peak, two peaks (bi-modal), or more? • If single peaked, symmetrical or skewed left or right? • Concentrations? Noticeably high peaks, that may not be the absolute peak • Gaps? Areas of low or no values

  48. Distribution Analysis Gaussian distribution

  49. Distribution Analysis

  50. Distribution Analysis Bimodal distribution for graduating lawyer salaries

More Related