1 / 66

Lecture 15: Evaluation

Lecture 15: Evaluation. November 18, 2010 COMP 150-12 Topics in Visual Analytics. Lecture Outline. Utility-Oriented Chart Junk vs. Tufte Nested Model for Design and Validation Current Practices Mechanical Turk Insight-Based Evaluation

xiang
Télécharger la présentation

Lecture 15: Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 15:Evaluation November 18, 2010 COMP 150-12Topics in Visual Analytics

  2. Lecture Outline • Utility-Oriented • Chart Junk vs. Tufte • Nested Model for Design and Validation • Current Practices • Mechanical Turk • Insight-Based Evaluation • MILC – Multi-dimensional In-depth Long-term Case Studies • Learning-Based Evaluation

  3. Motivation

  4. Epistemic Action • Different from “pragmatic actions” • Defined as interactions that moves a person and his analysis closer to the desired destination. • Epistemic actions • Enable humans to make use of environmental structures or to create structures in the environment that link with internal structures. • The purpose of some actions is not for the effect they have on the environment but for the effect they have on the humans.

  5. Tufte • Data-Ink Ratio

  6. Example

  7. Examples

  8. Examples

  9. Discussion • What do you think of the Data-Ink Ratio? • Consider ways to maximize the ratio… • What will you get?

  10. Chart Junk vs. Memory Bateman et al. “Useful Junk? The Effects of Visual Embellishment on Comprehension and Memorability of Charts”, CHI 2010

  11. Chart Junk vs. Memory

  12. Eye Gaze

  13. Results

  14. Results

  15. Questions?

  16. Nested Model

  17. Example

  18. Threats to Validity • Domain Threats • When a designer falsely assumes that the target audience requires visualization tool support. • Mostly qualitative designs including ethnographic field studies and semi-structured interviews. • Abstraction Threats • When the chosen operations and data types do not solve the characterized problems of the target audience. • Need to observe and document how the target audience uses the deployed system in their real-world workflow, typically in a long term field study.

  19. Threats to Validity • Encoding and Interaction Threats • When the visual encoding and interactiontechniques are not effective at communicating the desired abstraction to the user. • Can be tested using formal laboratory studies, or tested using known perceptual criteria. • Algorithm Threats • When the employed algorithm is suboptimal in efficiency. • Analyze computational complexity.

  20. Evaluation Mismatches • Common problem in evaluating visualization systems • Contribution of a new visual encoding cannot be tested using a quantitative timing of the algorithm. • Similarly, mischaracterized task cannot be addressed in a formal lab study.

  21. Similarly

  22. Questions?

  23. Mechanical Turk • Amazon’s Mechanical Turk System • Crowd-sourcing • Turk (or the mechanical Turk) refers to the fake chess-playing machine in the 1770s.

  24. Using the Turk • Goal: • Crowd source evaluation of visualizations • Benefits: • Have hundreds of evaluations • At a fraction of the time • Caveats: • Can we trust netizens?

  25. Discussion • How would you test this?

  26. Does it Work? • Paper by Heer at CHI 2010 • Reproduced seminal studies such as those by Cleveland and McGill • Demonstrated that the results from Turk are very similar to the results by Cleveland and McGill

  27. Cleveland & McGill • Cleveland & McGill presented studies on statistical visualizations in the 80s (including a Science paper in 82) • Goal: • How do visual encodings affect a user’s understanding of statistical graphs?

  28. Cleveland & McGill • For example: • Proportionality estimates across spatial encodings (Position vs. Length vs. Angle)

  29. Results

  30. Discussions • Clearly there is a difference between laboratory formal studies vs. Turk studies • What are the caveats?

  31. Questions?

  32. Discussion • How would you evaluate the efficacy of a visualization system?

  33. Insight-Based Evaluation • Proposed by Chris North et al. in 2005 • Determine the utility of a visualization system by the number of insights generated by the user

  34. Method • No Benchmark tasks • Training on data and visualization for 15 minutes • List some analysis questions that they would like to pursue • Asked to examine the data for as long as necessary until no new insights can be gained • During analysis, the users were asked to comment on their observations, inferences, and conclusions

  35. Evaluating the Results • The number of insights were tallied • Insights are defined as: distinct observations about the data by each participant • Collect all insights generated by all participants as a baseline • Various quantitative statistics collected on insight generation (time spent, time to first insight, etc.)

  36. Nested Model • What layer does an insight-based evaluation address?

  37. Problem • North’s definition of insight • as an individual observation about the data by the participant, a unit of discovery. It is straightforward to recognize insight occurrences in a thinkaloud protocol as any data observation that the user mentions is considered an insight.

  38. DIKW • DIKW = Data, Information, Knowledge, Wisdom • DIKW+I? = Data, Information, Knowledge, Wisdom, Insight? • Are these the same things?

  39. Example 1 • “our tool allows the biologists to interactively visualize and explore the whole set of trees, providing insight into the overall distribution and possible conflicting hypothesis” • Insight = knowledge about the overall distribution and are interchangeable

  40. Example 2 • “the analyst determined the answers to these questions, but also came up with further insights that she shared with people from other administrative units. She used the discovered information to advise other administrators of certain previously unknown relationships in their data” • Insight = information of previously unknown relationships

  41. Cognitive Science Definition • Something measureable in the front lobes and the temporal lobes (superior temporal gyrus). • Spontaneous Insight vs. Model-building Insight

  42. Example • Information Visualization • Knowledge-building Insight: • discovering insight, gaininginsight, and providing insight. • This implies that insight is a kind of substance, and is similar to the way knowledge and information are discussed. • Cognitive Science • Spontaneous Insight • experiencing insight, havingan insight, or a moment of insight. • In this context, insight is an event.

  43. Terminology • Can we measure spontaneous insight? • Can we measure knowledge-building insight? • Are they related?

  44. Questions?

  45. MILCs • Multi-dimensional In-depth Long-term Case studies by Shneiderman and Plaisant (2006) • Hypothesis: • Efficacy of tools can be assessed by documenting: • Usage (observations, interviews, surveys, logging, etc.) • How successful the users are in achieving their professional goals

  46. Nested Model • What layer does MILCs address?

  47. Discussion • When discussing a potential visualization system with a client, how do you find out what the visualization should be?

  48. Definition • Multi-dimensional • Using observations, interviews, surveys, and loggers • In-Depth • Intense engagement of the researchers with the expert users to the point of becoming a partner or assistant • Long-term • Longitudinal studies that begin with training in use of a specific tool through proficient usage that leads to strategy changes for the expert users. • Case studies • Detailed reporting about a small number of individuals working on their own problems, in their own environment

  49. Motivation • MILCs has been embraced by a small community of researchers interested in studying creativity support tools. • Challenge: • Cannot control for the users. • Cannot control for the tasks. • Toy-like problems in laboratories are not indicative of real-world problems and environments.

  50. MILCs are Difficult to Execute • Length of time is always a problem • Number of participants have to be small • Familiarities are difficult • Understand organization policies and work culture • Gain access and permission to observe or interview • Observe users in their workplace, and collect subjective and objective quantitative and qualitative data. • Compile data of all types in all dimensions • Interpret the results • Isolate factors • Need to repeat the process

More Related