Usability Testing

Interactive Media Design Week 3 Usability Testing

Usability Testing • Evaluate a product by testing it on users • gives direct input on how real users use the system • contrast with usability inspection methods • different methods to evaluate a user interface without involving users • Usability testing focuses on measuring a human-made product's capacity to meet its intended purpose • web sites or web applications • computer interfaces, • Documents • Devices • Usability testing measures the usability, or ease of use, of a specific object or set of objects • General human-computer interaction studies attempt to formulate universal principles.

Goals of Usability Testing • Usability testing is: • black-box testing • to observe people using the product to discover errors and areas of improvement • generally involves measuring how well test subjects respond in four areas • Efficiency, Accuracy, Recall, Emotional response. • The results of the first test can be treated as a baseline or control measurement • all subsequent tests can then be compared to the baseline to indicate improvement • Performance • How much time, and how many steps, are required for people to complete basic tasks? • E.g. find something to buy, create a new account, and order the item. • Accuracy • How many mistakes did people make? (And were they fatal or recoverable?) • Recall • How much does the person remember afterwards or after periods of non-use? • Emotional response • How does the person feel about the tasks completed? Is the person confident, stressed?

What Usability Testing is Not • Simply gathering opinions on an object or document is market research • Usability testing usually involves systematic observation under controlled conditions to determine how well people can use the product • Rather than showing users a rough draft and asking, "Do you understand this?", usability testing involves watching people trying to use something for its intended purpose.

Methods • Setting up a usability test involves • creating a scenario, or realistic situation, wherein the person performs a list of tasks using the product being tested while observers watch and take notes. • Several other test instruments such as scripted instructions, paper prototypes, and pre- and post-test questionnaires are also used to gather feedback on the product being tested. • For example, to test the attachment function of an e-mail program, a scenario would describe a situation where a person needs to send an e-mail attachment, and ask him or her to undertake this task. The aim is to observe how people function in a realistic manner, so that developers can see problem areas, and what people like. • Techniques popularly used to gather data during a usability test include think aloud protocol and eye tracking.

Think-aloud Protocol • TAP is a method used to gather data in usability testing in product design and development, in psychology and a range of social sciences (e.g., reading, writing and translation process research). • Think aloud protocols involve participants thinking aloud as they are performing a set of specified tasks. • Users are asked to say whatever they are looking at, thinking, doing, and feeling, as they go about their task. • This enables observers to see first-hand the process of task completion (rather than only its final product). • Observers at such a test are asked to objectively take notes of everything that users say, without attempting to interpret their actions and words. Test sessions are often audio and video taped so that developers can go back and refer to what participants did, and how they reacted. • The purpose of this method is to make explicit what is implicitly present in subjects who are able to perform a specific task.

Remote Testing • Remote Testing or Asynchronous Usability Testing • involves the use of a specially modified online survey • allowing the quantification of user testing studies by providing the ability to generate large sample sizes • Additionally, this style of user testing also provides an opportunity to segment feedback by demographic, attitudinal and behavioural type. • The tests are carried out in the user’s own environment (rather than labs) helping further simulate real-life scenario testing. • This approach also provides a vehicle to easily solicit feedback from users in remote areas.

Hallway Testing • Rather than using an in-house, trained group of testers, just five to six random people, indicative of a cross-section of end users, are brought in to test the software • the name of the technique refers to the fact that the testers should be random people who pass by in the hallway. • The theory, as adopted from Jakob Nielsen's research, is that 95% of usability problems can be discovered using this technique

How many users to test? • Jakob Nielsenpopularized the concept of using numerous small usability tests, typically with only five test subjects each, at various stages of the development process. • His argument is that, once it is found that two or three people are totally confused by the home page, little is gained by watching more people suffer through the same flawed design. • "Elaborate usability tests are a waste of resources. The best results come from testing no more than 5 users and running as many small tests as you can afford." • The claim of "Five users is enough" was later described by a mathematical modelwhich states for the proportion of uncovered problems U such that U = 1 − (1 − p)n • where p is the probability of one subject identifying a specific problem and n the number of subjects (or test sessions)

Challenges to Hallway Testing • Two key challenges to this assertion are: • since usability is related to the specific set of users, such a small sample size is unlikely to be representative of the total population so the data from such a small sample is more likely to reflect the sample group than the population they may represent • Not every usability problem is equally easy-to-detect. Intractable problems happen to decelerate the overall process. Under these circumstances the progress of the process is much shallower than predicted by the Nielsen/Landauer formula

Asymptotic graph towards the number of real existing problems

Why You Only Need to Test with 5 Users • http://www.useit.com/alertbox/20000319.html • Some people think that usability is very costly and complex and that user tests should be reserved for the rare web design project with a huge budget and a lavish time schedule. Not true. Elaborate usability tests are a waste of resources. The best results come from testing no more than 5 users and running as many small tests as you can afford. • In earlier research, Tom Landauer and I showed that the number of usability problems found in a usability test with n users is: • N(1-(1-L)n) • where N is the total number of usability problems in the design and L is the proportion of usability problems discovered while testing a single user. The typical value of L is 31%, averaged across a large number of projects we studied.

Plotting the curve for L=31% gives the following result:

Insights • The most striking truth of the curve is that zero users give zero insights. • As soon as you collect data from a single test user, your insights shoot up and you have already learned almost a third of all there is to know about the usability of the design. The difference between zero and even a little bit of data is astounding. • When you test the second user, you will discover that this person does some of the same things as the first user, so there is some overlap in what you learn. People are definitely different, so there will also be something new that the second user does that you did not observe with the first user. So the second user adds some amount of new insight, but not nearly as much as the first user did. • The third user will do many things that you already observed with the first user or with the second user and even some things that you have already seen twice. Plus, of course, the third user will generate a small amount of new data, even if not as much as the first and the second user did. • As you add more and more users, you learn less and less because you will keep seeing the same things again and again. There is no real need to keep observing the same thing multiple times, and you will be very motivated to go back to the drawing board and redesign the site to eliminate the usability problems. • After the fifth user, you are wasting your time by observing the same findings repeatedly but not learning much new.

Usability Testing