User Interface Evaluation in the Real World: A Comparison of Four Techniques

The Four Techniques Robin JeffriesJames R. MillerCathleen WhartonKathy M. Uyeda Hewlett-Packard Laboratories http://www.miramontes.com/writing/uievaluation/ January 1991(Proceedings of CHI'91, New Orleans, April 28 - May 3, 1991.) (c) 2013 Ashley Karr

Abstract • Software product user interface (UI) evaluated prior to release by 4 groups & 4 interface evaluation techniques • Heuristic evaluation • Software guidelines • Cognitive walkthroughs • Usability testing • Result: • Heuristic evaluation by several UI specialists found the most serious problems with the least amount of effort • However, they reported many low-priority problems • Advantages/disadvantages of each technique • Suggestions to improve techniques (c) 2013 Ashley Karr

Interface Evaluation Techniques Requiring UI Expertise • Heuristic Evaluation • UI specialists study interface in depth • Look for properties that they know, from experience, will lead to usability problems • Usability Testing • Interface studied under real-world or controlled conditions • Evaluators gather data on problems that arise during use • Observes how well interface supports user work environment (c) 2013 Ashley Karr

Interface Evaluation Techniques Requiring UI Expertise • Heuristic & Usability Testing Limiting Factors • People with adequate UI experience scarce • Techniques difficult to apply before an interface exists • Recommendations come at a late stage in development, often too late for substantive changes • May not be aware of design’s technical limitations or why certain decisions were made • Technical and organizational gulfs can arise between the development team and the UI specialists • Usability testing is generally expensive and time-consuming (c) 2013 Ashley Karr

Alternative Means of Evaluating Interfaces • Guidelines • Provide evaluators with specific recommendations about interface design • Cognitive walkthrough • Combines software walkthroughs with a cognitive model of learning by exploration • Interface developers walk through the interface in context of core tasks users must accomplish • Interface actions and feedback are compared to user goals and knowledge • Discrepancies between user expectations & steps required in interface noted (c) 2013 Ashley Karr

Alternative Means of Evaluating Interfaces • Benefits • Interface developers can evaluate the interface • Potentially increase number of people who can do evaluations • Avoid limitations mentioned earlier • Limitations • Little is known about • How well they work, especially in comparison to one other • The types of interface problems they are best-suited to detect • Whether non-UI specialists can actually use them • Cost/benefit terms (c) 2013 Ashley Karr

The Experiment • Studied the effects of interface evaluation type on the number, severity, benefit/cost ratio, & content analysis of interface problems found. • Between subjects design • 1 IV: Evaluation type • 4 levels • Heuristic Evaluation • Used researchers in HP labs who perform heuristic evals • Usability Test • Used HP’s usability tests for this product • Software Guidelines • Cognitive Walkthrough (c) 2013 Ashley Karr

Results – Data Refinement • DV: Reported problems found on interface • Common form • Numbers and kinds of problems detected by the different groups notes • Results later compared by raters • 268 problem report forms • 4 categories of problems • Underlying system • Problems caused by conventions or requirements of one of the systems HP-VUE is built on: UNIX, X Windows, and Motif • Evaluator errors • Misunderstandings on the part of the evaluator • Non-repeatable/system-dependent • Problems that could not be reproduced or were due to aspects of a particular hardware configuration • Other • Reports that did not refer to usability defects (c) 2013 Ashley Karr

The Interface • Beta-test of HP-VUE • Visual interface to the UNIX operating system • Provides graphical tools for • Manipulating files • Starting and stopping applications • Requesting and browsing help • Controlling the appearance of the screen, etc. (c) 2013 Ashley Karr

Results – Problem Identification • More than 50% of total problems found by heuristic evaluators • All problems found by heuristic evaluators were found by heuristic evaluation, not side effect • Few problems found by side effect during cognitive walkthrough and usability testing • Problems found by guidelines fell equally into all 3 categories • May indicate that guidelines-based approach valuable • Forces careful examination of interface • Large number of problems found by guidelines evaluators may be bc 2 of the 3 evaluators had worked with HP-VUE prior to evaluation (c) 2013 Ashley Karr

Results – Severity Analysis • 7 raters • 4 UI specialists & 3 people with moderate HCI experience • 206 core problems • Scale: 1 (trivial) to 9 (critical) • Impact of problem • Frequency it would be encountered • Relative number of users affected. • Mean ratings varied significantly • f(3,18)=5.86, p<.01 • Ordered by mean rated severity & splitting into thirds • Most severe: 3.86 or more • Least severe: 2.86 or less • Note: 1/3 of most severe problems credited to heuristic evaluators, but so can 2/3 of least severe (c) 2013 Ashley Karr

Results – Benefit/Cost Analysis • Value = summed severity scores the core problems from each evaluation • Cost = # person-hours spent by evaluators for each technique • 3 parts • time spent on the analysis itself • Times spent on learning the technique • Time spent becoming familiar with HP-VUE • Computations • 1st set of ratios = severity / sum of all times noted • Heuristic evaluation has 4-1 advantage • 2nd set of ratios = severity / time for cognitive walk through & guidelines - time spent on HP-VUE familiarization • Heuristic evaluation still has 2-to-1 advantage (c) 2013 Ashley Karr

Results – Content Analysis • 3 analyses carried out to understand content of problem reports • Consistency: Did the problem claim that an aspect of HP-VUE was in conflict with some other portion of the system? • 25% of the problems raised consistency issues • 6% of the problems identified by usability testing were consistency problems • Recurring: Is this problem one that only interferes with the interaction the first time it is encountered, or is it always a problem? • 70% found by guidelines and usability testing were recurring • 50% found by heuristic evaluation and cognitive walkthroughs were recurring • General: Did this problem point out a general flaw that affects several parts of the interface, or was the problem specific to a single part? • 40% overall were general • 60% found by guidelines were general • Usability testing found equal numbers of both types • Heuristic evaluation and cognitive walkthroughs found a greater number of specific problems than general (c) 2013 Ashley Karr

Discussion • Heuristic evaluation technique produced overall best results • +Found the most problems • +Found the most serious problems • +Had the lowest cost • -UI specialists scarce & need more than one to replicate results in this study • -No individual heuristic evaluator found more than 42 core problems. • -Large number of specific, one-time, and low-priority problems found and reported. • Usability testing • +Found serious problems • +Good at finding recurring and general problems • +Avoided low-priority problems • -Most expensive • -Failed to find many serious problems • Guidelines evaluation • +Best at finding recurring and general problems • +Well-designed set of guidelines beneficial • +Focus • +Evaluators take a broad look at the interface • +Developers can also follow • -Missed many severe problems • Cognitive walkthrough • Roughly comparable in performance to guidelines • Problems found were less general & less recurring than those found by other techniques (c) 2013 Ashley Karr

Discussion • Guidelines & cognitive walkthroughs • Used by software engineers to ID usability problems when UI specialists not available • Heuristic evaluation & usability testing • Great advantages over other techniques • Draw much of their strength from the skilled UI professionals who use them • Find most severe problems • Can run opposite risk of finding too many problems • Irrelevant “problems” • To decide which technique to use, consider • Goals of evaluation • Kinds of insights sought • Resources available (c) 2013 Ashley Karr

The Techniques & The Evaluators • Heuristic Evaluation • 4 heuristic evaluators • Members of HCI research group • Backgrounds in behavioral science and experience in providing usability feedback to product groups. • Technique • 2-week period for evaluation (had other job-related tasks) • Spent whatever amount of time they chose within that period • Reported the time spent conclusion of their evaluation • Usability Tests • Conducted by a human factors professional • Product usability testing is a regular part of job • Six subjects took part in the test • Regular PC users not familiar with UNIX • Spent about three hours learning HP-VUE • 2 hours doing a set of 10 user tasks defined by the usability testing team (c) 2013 Ashley Karr

The Techniques & The Evaluators • Guidelines and Cognitive Walkthroughs • Could not use ACTUAL developers for HP-Vue • Used teams of 3 software engineers • Researchers at HP Laboratories with product experience • Substantial familiarity with UNIX and X Windows (the computational platform for HP-VUE) • Designed and implemented at least one graphical UI • All of the evaluators spent time familiarizing themselves with HP-VUE before doing the evaluation. (c) 2013 Ashley Karr

The Techniques & The Evaluators • Guidelines • Used a set of 62 internal HP-developed guidelines • Based on established human factors principles and sources • Can be applied across a wide range of computer and instrument systems • Meant to be used by software developers and evaluators. (c) 2013 Ashley Karr

The Techniques & The Evaluators • Cognitive Walkthrough • Task-based method • The experimenters selected the walkthrough tasks and provided them to the evaluators • Pilot cognitive walkthrough experiment • Refined procedure and tasks prior to the actual experiment (c) 2013 Ashley Karr

The Problem & Report Form • Special Report Problem Form • Standardize reporting of UI problems across techniques • All evaluator/team reported every usability problem on form • Usability test evaluator and each heuristic evaluator submitted separate forms • Guidelines and cognitive walkthrough groups submitted team reports • Defined a usability problem as "anything that impacts ease of use” • Evaluators were asked to briefly describe the problem • Encouraged to report problems found even if technique being used did not lead to the problem • Note how they found the problem (c) 2013 Ashley Karr

Results – Data Refinement • 3 Raters categorized • Worked independently • Reconciled differences as a group • Overall, 45 (17%) of the problems were eliminated • Approximately equally across groups. • 223 core problems remained • Addressed usability issues in HP-VUE • 3 raters looked for duplicate problems within evaluation groups • Conflicts were resolved in conference • 17 sets of within-group duplicates produced • Analysis that follows based on 206 problems that remained after categorization, reconciliation, conference (c) 2013 Ashley Karr

User Interface Evaluation in the Real World: A Comparison of Four Techniques