Evaluation Methods

Evaluation Methods Research & Project Methods SECC504 Professor Julian Newman 15/10/08

Why should we accept your proposal? Bologna Process and CPHC/QAA Benchmarking Standards An MSc Project/Dissertation should demonstrate: • a sound justification for the approach adopted • self-critical evaluation of effectiveness • sense of vision about the direction of developments in aspects of the discipline Masters Ethos requires you to: • be acquainted with the newest theories, methods and techniques in your specialised field • have sufficient competence in techniques of independent research and be able to interpret the results at an advanced level • demonstrate the ability to apply the principles and practices of the discipline in making an original contribution by tackling a significant technical problem Summarised from CPHC(2008) Benchmarking Standards for Masters Degrees

Why should we support you? (1) • Crane: Invisible Colleges • Influential scientists & engineers determine • What is worth pursuing • What was successful • “Published papers are not for information” • An exaggerated view – with a grain of truth • Working scientists use pre-print servers etc • Whitely: Results are valued if other researchers can build upon them.

Why should we support you? (2) • Latour (School of Mining, Paris): there are “Internal” and “External” scientists • Internal Scientists do the lab work • External Scientists • promote importance of the research area • mobilise resources • “An isolated scientist cannot even create a controversy” • To count as “research” your work must relate to a recognisable “research programme” • Foucault: Power resides in Discourse

Why should we pay attention to your results? – relevance and reliability • Parnas: There are different research Paradigms followed in Science and in Engineering. • In Science, the problems are taken from the literature (i.e. from other researchers) • In Engineering, the problems are taken from practitioners (i.e. real-world problems) • Software Engineers too often take problems from Literature, like Scientists, ignoring real-world problems. • Hence software developers ignore the research literature. • Literature search should not be the only source of problems – look also at problems of existing technology.

Evaluation in Scientific and Technological Discourse • Hume’s problem of Induction: we cannot logically reason from fact (specific) to theory (general). • So how do we persuade scientific or professional community of the importance of our results? • This is not just a philosophical issue, it is one of practical importance. • Evaluation is essential in mobilising resources and in “selling” your results.

Jamie Fleck’s “Credibility Cycle”(adapted from Bruno Latour) Discourse Generation of ideas Resources Needed to test out the ideas Effectiveness Show that the ideas work

Example from the History of Artificial Intelligence in 1970s • AI pioneers made big claims for what could be achieved and about timescale. • “Computers will soon have an IQ of 120 and then we shall have to give them the vote.” • UK Government commissioned Sir James Lighthill (a Physicist) to report on whether it was worth continuing to fund AI. • Lighthill said only Robotics was worthwhile.

Lighthill blocks AI’s Credibilitythus denying resources X Discourse Lighthill: “Most AI is not worthwhile, except robotics” Ideas and motivation Resources Effectiveness Govt allocates inadequate resources to AI Without resources, it is hard to prove effectiveness of ideas

McCarthy’s Review of Lighthill • AI community criticised Lighthill report as based on misunderstanding of what AI is about. • John McCarthy, US ‘father of AI’ also criticised Lighthill, but said that the AI community itself was partly to blame. • Too much published AI research suffered from the “Look Ma, No hands!” disease. • Described what the program did, and pointed out no program had done it before. • But did not elucidate any general lessons, principles or insights.

The Credibility Cycle of an MSc Assessment Your Proposal Discourse Literature Search & Technology Review Your Dissertation Resources Lab Access Supervision etc Effectiveness Evaluation of Results

Timing of Evaluations • Designers often distinguish “Early” and “Late” Evaluations. • Early evaluation is often “Predictive Evaluation”, i.e. based on model or theory plus either • Previous published results • Simulation • Late evaluation is likely to be in a Usability Lab. • Accountants distinguish “Ex Ante” and “Ex Post” Evaluations (before and after investment). • Scriven introduced distinction between • “Formative” Evaluation (during development, helping to improve artefact or system) and • “Summative” Evaluation (of completed end-product).

Evaluation of a Project • Beforehand: Proposal Evaluation (e.g. for funding or registration) • Evaluation of a Product or Prototype • ‘Early evaluation’ of proposed design • Usability labs • Stakeholder evaluation (a wider group than “users”) • Investigators’ evaluation of the answers to Research Issues (possibly Hypotheses) • Critical self-analysis of the work done • Statistical tests • Design rationale • Evaluation of Project • Evaluation of Outputs (e.g. by journal referees) • Evaluation of Project as a Whole (e.g. MSc assessment, Research Council IGR, EC Project Review etc – similar reviews of industrial projects take place within companies)

Evaluation of a Masters’ Proposal (1) A Proposal will be evaluated with respect to the following classes of issue:- • Logistical and Practical Issues • Is it possible? • Methodological and General Issues • Does it make sense? • Is it worthwhile?

Evaluation of a Masters’ Proposal (2)Logistical and Practical Issues Is the proposed work realistic for a 4 person-month project? Is the Plan based on a Work Breakdown that relates Activities to the Aim and Objectives? Is the Plan sufficiently detailed for progress monitoring? Does the Plan allow enough time for initial Literature Search and for final Writing-up of the Dissertation? Has the student provide a clear assessment of the major Risks in the project and strategy for managing these? Has access been assured for all Resources required? Have Ethical issues been correctly addressed? (see form EC5)

Evaluation of a Masters’ Proposal (3)Methodological and General Issues Is the Topic suitable for the Programme on which the student is enrolled? Does the proposal clearly state the Aim and Objectives? Is the Scope of the work clearly defined? Is the proposed work sufficiently novel and rigorous for MSc/MA? Does the proposal justify the importance of the problem? Is the intended Method clearly described? Does the Method specify an appropriate practical element? Is the Method logically adequate to the stated Aim and Objectives?

“Is the Method Logically Adequate to the Stated Aim?” • We previously • Noted that the Hypothetico-Deductive method was ONE approach to analysing RESEARCH DISCOURSE; • Identified some alternatives to Hypothesis-testing: • Comparison of techniques through toy implementation • Development to meet real-world requirements • Experiments to establish parameters • Observation for understanding in Usability Labs or in Field • Case Study of a particular situation • Detailed Ethnographic study of work practices • All these pose problem: how to justify generalisations? • Answer to this may lie in the idea of “Design Science” or “Sciences of the Artificial”.

Design research and the “reflective practitioner”. • ‘Design Studies’ aims to establish Design as separate from the Humanities & Sciences • What is the form of generalisation that constitutes design knowledge? • When is ‘design’ ‘design research’? • Schon: Reflective practitioner • Simon: Sciences of the Artificial • Maclean: Design space analysis • Sutcliffe, Carroll: Claims Analysis

Reflective Practitioner • Donald Schon advanced the notion that education for practical professions should be based on “reflection”. • The research component consists of systematically analysing and reflecting upon design decisions and implicit design principles. • Keeping a Design Journal can be an important aid to being a reflective practitioner.

Reflection and the MSc Dissertation • Systematic reflection, especially keeping a Design Journal, will assist in evaluative assessment of your own work within the Dissertation. • Systematic reflection supported by a Design Journal will also link to Personal Development Planning, and provide material that can be used in improving CV, job applications, etc. • However, the Dissertation itself is not a piece of personal autobiography: thus evaluation within the Dissertation needs to be framed in an academic and professional register.

Sciences of the artificial(H A Simon) • "Artificial" systems: have a given form and behaviour only because they are: • ADAPTED TO ENVIRONMENT IN REFERENCE TO GOALS • Analogy between:- • EVOLVED SYSTEMS • DESIGNED SYSTEMS

Sciences of the artificial(H A Simon) • Principles: • Stable intermediate forms speed the development of complex systems • Human decision-making is conditioned by limited capacity for handling information

Sciences of the artificial(H A Simon) • MAIN CHARACTERISTICS OF DESIGN (H A Simon, as developed by William Newman, John Long, Allan Maclean …) • Inner environment: i.e. Technology that Designer selects as means of bringing about Change • Outer environment: in which change is to be produced • Inner environment acts on outer environment across an Interface

Sciences of the artificial(H A Simon) • MAIN CHARACTERISTICS OF DESIGN (CONTD) • Interface protects outer environment from internal complexities of inner environment • Designer needs to model both Inner and Outer environment • Designer needs to subdivide problem into manageable parts • Simulation can show how complex system is likely to behave

Sciences of the artificial(H A Simon) • POSSIBLE IMPLICATIONS • Is your Dissertation aiming to model the Inner or Outer environment? • A Development project might focus on Inner Environment (Technology) • An Experimental or Case Study dissertation might focus on Outer environment (Application Context) • A Usability study might focus on the Interface • Technology Interfaces can also be studied • Is proposed Dissertation a ‘manageable part’ of a larger problem?

Evaluation Within a Project • BASELINE is an Initial Evaluation of • State of the Art (Technology Assessment) • Current theory and practice • Evaluation of Design Alternatives • Important to identify the reasoning that underlies your design decisions • Avoid the “Look Ma, no hands!” disease • Evaluation of Results • E.g. Usability Lab, Hypothesis Testing, etc etc

Evaluation of Lessons Learned • Showing generality of results • Design as search in a problem space • Decision space • Evaluation space • Problems of evaluation in real world • Large Complex (usually Distributed) Systems • Organisational factors affecting real-world evaluation

In an Experimental project, how do we show generality of results? Experimental report (body) • General • Specific • General

In an Experimental project, how do we show generality of results? Experimental report (body) • Introduction • Hypothesis (proposed generality)

In an Experimental project, how do we show generality of results? Experimental report (body) • Introduction • Hypothesis (proposed generality) • Method (specifics) • Results

In an Experimental project, how do we show generality of results? Experimental report (body) • Introduction • Hypothesis (proposed generality) • Method (specifics) • Results • Statistical test (assessed generality) • Conclusions & Discussion

In a Developmental project, how do we show generality of results? Report (body) • Introduction • Problem • Method/Design Principles • Results • What goes here? • Conclusions & Discussion

Design Rationale can take place of a formal hypothesis test Report body • Introduction • Problem • Method • Results • Design Rationale (relate design decisions to general principles) • Conclusions & Discussion

Evaluation Methods