Successful Instrumentation

Tracking Attitudes and Behaviors to Improve Games Successful Instrumentation Ramon Romero Game Developer’s Conference 2008

http://www.mgsuserresearch.com The website is filled with useful information we have presented in previous talks on the subject of User Research What is it? User Research

Creating a Feedback Loop We have a lot of experience working with game developers and conforming our approaches to the challenges of the development schedule • For Game Designers • Lend audience insights • Detect problems • Create opportunities to fix those problems • Prior to release • User Advocate / Data Champion • Research expert In our setup we have a person who is entirely devoted to the problem of representing and working on user data

Using Formal Research Methods • From a Variety of Research Disciplines Industry Researchers • Usability Engineer • Human Factors Engineer Academic Researchers • Cultural Anthropologist, Ethnographer • Experimental Psychologists • Cognitive, Social, Developmental, Behavioral There are multiple research types that can provide value to game developers at different points in the development cycle

What is it? Some people will talk about ‘logging’ or automation to refer to the same thing. But what do we mean… Instrumentation

Tracking Real-time User Experience (TRUE) This refers to logging the things that matter most to the play experience. Where did people die, what killed them, what were their opponents carrying… Note how our report viewer is active – the arrow is pointing towards that report – the viewer is actively trying to obtain information We will re-present the key points from this diagram through the remainder of this presentation

TRUE System • Critical events • Surrounding context On the TRUE System slide(s) we will be rebuilding that diagram. Critical events and Surrounding context are things we measure. Critical events are relatively easy to understand. Think of major progress (beating a boss) and major setbacks (dying, losing all cash money) as Critical. Surrounding context refers to related information (what were they holding, what level were they).

Our first example. Imagine it is the Summer of 2004 and that we just asked a number of real consumers to play through all of Halo 2 over the course of a weekend of testing. That test has just completed and now we are going to look over the results of that test, just like the ‘active viewer’ from the diagram. Time is pressing, the game will be out this Fall and we need to turn feedback around fast…

…average deathsper mission… …for all missions in the game… Mission 6 ≈ 8+ hours in… Although in the course of regular analysis we would examine everything, our eyes are drawn to the spike for Mission 6. And so we click on the bar in the chart

And now we see the details of death counts for individual encounters across the entire mission. An encounter is the smallest meaningful chunk of a Mission to a designer at Bungie. It can be a conversation or a cutscene but most often it will be a firefight. Results from firefights are below… Mission 6 ≈ 8+ hours in…

In regular analysis we would examine all 5 spikes… but for this talk we will focus in on one example with “interesting” results. Now let’s say you want to get even more information so you click on this bar. “interesting” results ≈ 8+ hours in…

Cause of Player Death But our “interesting” result was the high number of Unknowns. Something occurred here which we did not anticipate, and hence had no label for it. 16% was very high. The highest we saw in other parts of the game was 1-2%. And so wanting to learn more we click on the Unknown portion of the chart… And now we see who is causing the various deaths. The Flood Humans are the greatest source of problems and so we perhaps could retune them in some fashion.

Video plays… multiple players are drawn to their own doom… to a pit that is too attractive and looks too much like the correct way to go… • …nearly everyone is fooled by the pit once… they fall straight in…

And this is how we fixed that pit

TRUE System • Critical events • Surrounding context • Supporting information • Video Deaths, both averages and raw counts What killed them In this example video acts as the back-up plan. Something occurred that we could not anticipate. Rather than plowing through hours and hours of video to discover the source of the problem. The TRUE system points us right to the problem so we can use those hours to think about solutions.

TRUE Principles • Instrument to answer ‘why’ When building a TRUE system it is important to be able to use the data to tell a story when you have collected it. Otherwise it has a tendency to sit or for findings to remain mysterious and unhelpful. Another example…

In Forza 2 one of the modes of play is a Time Trial. You beat the time on a given track and you earn a car. We tested every one of them and will focus in on the results from the Tsukuba Short Circuit…

Time Trial Summary …where things did not go well On first appearance things look really bad. People are averaging runs that are nearly 40 seconds off. But that is a little misleading…

Arcade | Time Trial | Tsukuba Short You see we actually had people run the race 10 times and so it is a better data presentation to break out results per run. Averages are not always your friend. Click ahead to look through individual results…

Tsukuba – P1 This participant improved over time, nearly beating it but not quite

Tsukuba – P2 She nearly beat it on 4 separate occasions

Tsukuba – P3 Wow – dramatic changes… and was able to put together one decent lap

Tsukuba – P4 Lots of good runs but even that last one was not good enough

Tsukuba – P5 Another case of dramatic change. Data loss can occur. Sometimes our logging system fails, or the game crashes or a participant needs to use the restroom. All of this means that sometimes we do not learn the full story for all participants

Tsukuba – P6 Several close runs but no cigars

Time Trial Summary It has been suggested that in the face of all this data that a development team will lose control of their vision. This is not the case. The Game Development team makes all the final decisions about how to adjust their game. Design decided 48.8 seconds UR suggested 50.2 seconds The Game Designers looked over the same data but they knew that the cars were not done being tuned and decided a different number would work well. The next few slides show you how well their number worked… So 84.9 was the wrong number to pay attention to. Instead we focused on each individual’s best run and how they progressed over time. And then we made a suggestion…

Adjusted Target Time

Tsukuba – P1, Adjusted

TRUE System • Critical events • Surrounding context • Supporting information • Video • Data visualization Needs iteration. LOTS!! Data visualizations are a key aspect of the TRUE system. You have seen a bunch already and they are not always so straightforward to create

TRUE Principles • Instrument to answer ‘why’ • Make the key findings pop • Intent • Designers must declare intent This is the goal of those visualizations. No theoretical discussions. No debates. Clear clearclear findings. If not then the Data Visualizations actually work against you, no matter how clear they are to you. Luckily there are a few Game Designers around who can help you out with this. Once they declare intent then working together you can create those visualizations. Examples of what we mean by intent…

Time Trial Summary We already saw the intent statement from the Forza 2 designers… but there was more to it. Beatable within 10 tries This is an excellent statement. The statement not only helps determine the nature of the visualization it also helped us determine precisely how to test it… i.e., let’s give them 10 laps and see what happens. It’s also a really easy example of design intent, as was our Halo 2 example earlier… people need to die at the “intended” rate… But there are much harder cases…

Crackdown is a successful title that was released in 2007… It is a non-linear game… This creates difficulties when attempting to map out intent. If people die too much then they are supposed to find another way around. At any one time players can be doing anything…

The many ways to play Crackdown…

Users must find their own fun… The experience players had with Agency Nodes, also referred to as Supply Points is an interesting example… In the game these points are places where you go to re-supply your weapons… they also double as re-spawn points. The intent statement is very broad… and so we found that certain aspects of the game’s intent were not really declared but were understood only in the context of the play experience…

Video plays… opening play sequence in Crackdown (starting after opening cutscene completes)… run around a little… find car… drive out of Agency starting point… takes a minute or two…

Video plays… jump ahead… we found some fighting… run around… eventually die… and respawn… back at the Agency… where we started… now we have to go through the same tired sequence of finding a car and driving out of there before we can get back to the action…

People were not finding the orange supply points which, again, are respawn points that players could use to get back to the action sooner. So this meant death was more punishing than the Designers intended. Using TRUE we started tracking how long it took players to find the orange beacons…

Users must find their own fun… How many times do you think people died in 31 minutes of play… quite a few it turns out… more than 50 times for one poor individual. So we made a few adjustments to make sure that players would notice these beacons and things improved over time. …but first they should find… And returning to our key point… the intent statement needed to evolve and did so as an iterative function.

TRUE Principles • Instrument to answer ‘why’ • Make the key findings pop • Intent • Designers must declare intent • UR must find a way to measure it Once the intent is declared (or discovered) the act of measuring it and analyzing it can be really straightforward as in all examples so far. But sometimes the measurements can be misleading or unclear...

Valhalla is a multiplayer map in Halo 3. It was available as early as the Alpha test period. In the distance is one of the towers on this map. Towers are places where players will spawn into the game, vehicles are usually nearby, there are a pair of transport devices that will also shoot a players out into the environment. Players are expected to fight for control over them. On the next slide we will look at another picture of the same tower. This time it will be a small red blob on the right side of the picture.

H3 Alpha Everywhere you see red is a spot where relatively MORE deaths occurred then in the black and grey areas. We call this a heat map. The deeper the red, the hotter the spot, the more violence people are committing. Anyway the neat thing is looking at the huge problem we can see here. You see it right? It’s here… and easiest to understand when we look at the beta results for comparison. Do not feel bad if you missed it. The User Researcher working on the product almost missed it too. Here it is… No not this… Not the other tower… And not here either…

H3 Alpha Users must use the entire map… People were not using this part of the map… But let’s not gloss over the key point here. The User Researcher working by him or herself might have missed this because all aspects of design intent will not be clearly declared every time. So an assumption in the design intent was found and declared and then the adjustment was relatively straightforward. They changed the direction that the transport devices would shoot players so that they could experience all parts of the map. And the beta results showed that the adjustment worked out as hoped. H3 Beta

TRUE Principles • Instrument to answer ‘why’ • Make the key findings pop • Intent • Designers must declare intent • UR must find a way to measure it • Design and UR analyze together All the more reason to concentrate on the visualizations and ensuring the findings are instantly understandable Designers are expert at the experience they are trying to create, so naturally they should help with the analysis. In the example you just saw the Designers at Bungie saw the problem instantly where the User Researcher nearly missed it

Successful Instrumentation

Successful Instrumentation

Presentation Transcript

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

INSTRUMENTATION

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation

Instrumentation