Comprehensive Guide to Software Evaluation Methods

Evaluation

Evaluation • There are many times throughout the lifecycle of a software development that a designer needs answers to questions that check whether his or her ideas match with those of the user(s).Such evaluation is known as formative evaluation because it (hopefully) helps shape the product. User-centred design places a premium on formative evaluation methods. • Summative evaluation, in contrast, takes place after the product has been developed.

Context of Formative Evaluation • Evaluation is concerned with gathering data about the usability of a design or product by a specific group of users for a particular activity within a definite environment or work context. • Regardless of the type of evaluation it is important to consider • characteristics of the users • types of activities they will carry out • environment of the study (controlled laboratory? field study?) • nature of the artefact or system being evaluated? (sketches? prototype? full system?)

Reasons for Evaluation • Understanding the real world • particularly important during requirements gathering • Comparing designs • rarely are there options without alternatives • valuable throughout the development process • Engineering towards a target • often expressed in the form of a metric • Checking conformance to a standard

Classification of Evaluation Methods • Observation and Monitoring • data collection by note-taking, keyboard logging, video capture • Experimentation and Benchmarking • statement of hypothesis, control of variables • Collecting users’ opinions • surveys, questionnaires, interviews • Interpreting situated events • Predicting usability

Observation and Monitoring - Direct Observation Protocol • Usually informal in field study, more formal in controlled laboratories • data collection by direct observation and note-taking • users in “natural” surroundings • “objectivity” may be compromised by point of view of observer • users may behave differently while being watched (Hawthorne Effect) • ethnographic, participatory approach is an alternative

Observation and Monitoring - Indirect Observation Protocol • data collection by remote note taking, keyboard logging, video capture • Users need to be briefed fully; a policy must be decided upon and agreed about what to do if they get “stuck”; tasks must be justified and prioritised (easiest first) • Video capture permits post-event “debriefing” and avoids Hawthorne effect (However, users may behave differently in unnatural environment) • with data-logging vast amounts of low-level data collected; difficult and expensive to analyse • interaction of variables may be more relevant than a single one (lack of context)

Experimentation and Benchmarking • “Scientific” and “engineering” approach • utilises standard scientific investigation techniques • Selection of benchmarking criteria is critical…and sometimes difficult (e.g., for OODBMS) • Control of variables, esp. user groups, may lead to “artificial” experimental bases

Collecting User’s Opinions • Surveys • critical mass and breadth of survey are critical for statistical reliability • Sampling techniques need to be well-grounded in theory and practice • Questions must be consistently formulated, clear and not “lead” to specific answers

Collecting User’s Opinions - Verbal Protocol • (Individual) Interviews • can be during or after user interaction • during: immediate impressions are recorded • during: may be distracting during complex tasks • after: no distraction from task at hand • after: may lead to misleading results (short-term memory loss, “history rewritten” etc.) • can be “structured” or not • a structured interview is like a personal questionnaire - prepared questions

Collecting Users Opinions • Questionnaires • “open” (free form reply) or “closed” (answers “yes/no” or from a wider range of possible answers) • latter is better for quantitative analysis • important to use clear, comprehensive and unambiguous terminology, quantified where possible • e.g., daily?, weekly?, monthly? Rather than “seldom”, “often” and there should always be a “never” • Needs to allow for “negative” feedback • All Form Fillin guidelines apply!

Relationship between Types of Evaluation and Reasons for Evaluation Observing and Monitoring Users’ Opinions Experiments etc. Predictive Interpretive Y Understanding Real World Y Y Y ComparingDesigns Y Y Y Y Y Engineering to target Y Y Y Y Y Y Standards conformance Y Y Y Y Y

Comprehensive Guide to Software Evaluation Methods

Comprehensive Guide to Software Evaluation Methods

Presentation Transcript

evaluation

Evaluation

Evaluation

Evaluation

EVALUATION

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

EVALUATION

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation Economic Evaluation

Evaluation

Evaluation