MRS Connections, Barbican London, 22-24 March 2006

MRS Connections, Barbican London, 22-24 March 2006 Connecting with eLizabeth Using artificial intelligence as a data collection aid Craig Kolb, Ask Afrika & Dr Peter Millican, Hertford College, Oxford University

ELIZA in action Weizenbaum (1966) gave a classic example of a rather comical dialogue between ELIZA and a young woman, from which this is an edited excerpt. (ELIZA’s responses are in capitals)

ELIZA in action • Men are all alike. • IN WHAT WAY? • They're always bugging us about something or other. • CAN YOU THINK OF A SPECIFIC EXAMPLE? • Well, my boyfriend made me come here. • YOUR BOYFRIEND MADE YOU COME HERE? • I need some help, that much seems certain. • WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP? • Perhaps I could learn to get along with my mother • TELL ME MORE ABOUT YOUR FAMILY.

ELIZA in action • My mother takes care of me. • WHO ELSE IN YOUR FAMILY TAKES CARE OF YOU? • My father. … • WHAT ELSE COMES TO YOUR MIND WHEN YOU THINK ABOUT YOUR FATHER? • Bullies. • DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE?

ELIZA Two key components underlie ELIZA’s workings: The process used by the program itself The assumptions made by the human

ELIZA: process • Identify keywords (e.g. ‘alike’, ‘mother’) and use canned responses. • Exploit context (e.g. remember ‘my’ phrases to use later). • Apply transformation rules (e.g. ‘I’ to ‘you’). • Where possible, echo back parts of the input. • Default responses when no keyword identified.

Enigmatic • Never answers a question for you • Inspires you to answer your own problems ELIZA: human assumptions What do weexpect from a psychotherapist?

ELIZA: human assumptions ‘This mode of conversation was chosen because the psychiatric interview is one of the few examples of … communication in which one of the participating pair is free to assume the pose of knowing almost nothing of the real world.’ Weizenbaum

Such assumptions are not confined to chatbot dialogues • For example: • Deliberate obscurantism. • Lack of power parity. • Irrational or abstract statements to ‘save face’. • Marketing research interviews.

In general conversation, chatbots quickly betray themselves through lack of commonsense knowledge or ability to respond flexibly ‘off topic’. But in an interview setting this may not be a problem and can even be an advantage.

This suggests that a chatbot might be a useful tool in marketing research

Introducing eLizabeth • User-friendly chatbot system developed initially to teach artificial intelligence (A.I.). • Rigorously specified and comprehensively documented, with online help etc. • Highly flexible and easily programmed. • Can save/retrieve information in memory or on disk files, and respond accordingly. • All chatbot states and operations can be viewed in real time (e.g. while testing).

Application: Open-ended questions • Typically generate superficial responses unless probed. • Respondents are more forthcoming when answers are suggested. • But this gives risk of ‘leading’ or bias.

Real answers to an unprobed open-ended question Three mobile networks were involved in this study, referred to as A, B, and C

A solution • We suggest that an adaptation of standard chatbot techniques provides an economical means of probing open-ends. • By creating a sense of conversation (rather than a linear questionnaire), the quality of response is improved over self-completion. • This is ideal in situations such as CAWI where an interviewer is not necessarily present.

Interview 1 • WHY DID YOU CHOOSE YOUR CURRENT CELL PHONE NETWORK OPERATOR? • To save money. • LIST ALL OF THE THINGS THAT WORRY YOU ABOUT YOUR NETWORK PLEASE. • Network coverage and 3g. • WHAT ELSE CAN YOU SAY ABOUT CELL PHONE FEATURES? • I like my new handset, which is very important to me. I also like the extra ‘Ahas’ – such as emergency services. • WAS IT ONLY ABOUT THE CELL PHONE HANDSET? • No, but they did deliver it to me for free. I didn't have to go anywhere. • LIST ALL OF THE THINGS YOU SEE AS GOOD ABOUT YOUR NETWORK PLEASE. • Cheap; additional services such as free delivery; nice shows and stuff. • IS SAVING MONEY MORE IMPORTANT THAN OTHER THINGS? • Value for money is important, I don't mind paying A BIT more, but then I want to get more.

Interview 1 • WHAT DID FAMILY AND FRIENDS HAVE TO SAY ABOUT YOUR CHOICE OF NETWORK? • They don't really care, but I also have my mom on network A now. • HOW DO YOU FEEL ABOUT FAMILY INVOLVEMENT? • I love my family, but we don't really care about each other’s cell phone habits. • HOW MUCH OF THE FINAL DECISION ON A PARTICULAR NETWORK WAS REALLY YOURS? • 90%, 10% of it was because of the fact that I work on network A projects. • WOULD YOU USE THIS NETWORK AGAIN IN FUTURE, IF YOU HAD TO CHOOSE AGAIN? • Yes. • YES? PLEASE GO INTO A BIT MORE DETAIL. • I like the additional services; and I like the network A image. • WHY DO YOU LIKE ‘THE ADDITIONAL SERVICES’? • They deliver straight to me!

Context: how did we introduce the chatbot? Possibilities: • Pretend it is human (high expectations, but temporarily shielded by the assumption it is something it is not)? • Say it is an ‘artificial intelligence’ (high expectations, with possible hostility)? • Describe it in a way that creates lower expectations? e.g. ‘Automatic interviewer’

Programming eLizabeth • Elizabeth was easy to program, using only simple additions to the default behaviour. • We defined a ‘welcome’ message, responses for null inputs, keyword sets with associated ‘once-only’ probes, and a list of ‘set questions’ for when no keyword was found. • The resulting operation of the main cycle is represented in the following slide …

Performance criteria • Alan Turing (1950) proposed a simple test. If a computer in online conversation can fool a human into believing that it is human too, then we should deem it to be intelligent. • Hence the focus over the years on developing chatbots that can ‘appear’ as human as possible. • Also a drive in roboticstoward making robotslook human (e.g.Actroid in picture).

Performance criteria • Turing’s Test isn’t relevant to interviewing, as a different set of societal norms apply: • Interviewer normally controls the dialogue to prevent deviations from the topic (i.e. ‘fixed initiative’). • Interviewer is allowed to assume the pose of ignorance.

So then how do we evaluate a chatbot or A.I. interviewer?

Interviewing-specific performance criteria • 1) Relevance of interviewer questions (i.e. questions specified by the researcher that were not previously answered). • 2) Avoidance of ‘suggestion’ (i.e. hints or ‘leading’). • 3) Relevance of respondents’ answers (i.e. do they answer the question in part or in full?). • 4) Maximisation of the volume of attributes elicited (i.e. mobile network operator attributes).

eLizabeth’s Performance

What variables might determine chatbot interviewing success? Respondent characteristics (gender, age, education etc). Interview length. Context presented to the respondent, expectations created (e.g. conversation vs. interview, A.I. vs. ‘automatic interviewer’). Chatbot’s own behaviour.

Hypotheses • Perceptual hypotheses • H1: Subjects with strong vigilance and analytical traits are more likely to detect irrelevant questions. • H2a: The longer the exchange on a topic, the more likely the chatbot is to be seen as asking irrelevant questions. • H2b: The longer the dialogue overall, the more likely the chatbot is to be seen as asking irrelevant questions.

Hypotheses • H3: Subject knowledge of whether they are communicating with an A.I. or chatbot (and how it is described) is likely to moderate the statistical relationships described in H1 and H2/H2b.

Hypotheses • Response hypotheses • H4a: Perceptions of irrelevance will increase the likelihood of hostility, dependent on how the context is defined. • H4b: Increased hostility will decrease the relevance of the respondent’s answers. • H5: Introducing a chatbot using terms that create lower expectations (such as ‘automatic interviewer’) will moderate the irrelevance/hostility relationship.

Hypotheses • H6: Limited mimicry (i.e. reflecting back a subject’s statements, with appropriate grammatical alterations) will result in increased volume and improved answer relevance. This hypothesis is based on the chameleon effect, as reported by Poulsen (2005). • H7: Whether the human party in a dialogue has prior knowledge that the other is an A.I. or chatbot will moderate the relationships in H4a and H6.

Chatbot strengths Compared with human interviewers, chatbots have some clear advantages: • Low cost. • Unaffected by pressure. • Better memory. • Give researcher greater control over interviews. • Can ensure consistency and rapid deployment in a changing context.

Chatbot weaknesses The disadvantages of chatbots compared with humans stem from their lack of genuine understanding: • Minimal control over response quality (though better than self-completion). • No ability to assess whether unasked questions have already been implicitly answered earlier in the dialogue (relevance).

Future applications This suggests the following in terms of assisting or replacing human interviewers: Assisting human interviewers • Suggesting useful probes and follow-on questions, ensuring greater control. • Reducing stress of memory, phrasing, context adjustment (e.g. from structured questioning to open-end probing). • Recording and collating results. • Semi-automated learning during survey.

Future applications Replacing human interviewers • Practical in situations where CAWI has replaced CATI but probing is still desired on open ends. • Techniques may be applicable to sentence completion, personification, and thematic apperception tests. • More sophisticated structured techniques, such as Kelly’s Repertory Grid.

Applying chatbot technology to your own research To explore chatbots, try: www.eliz.millican.orgtaking you to eLizabeth’s home page (which also links to other chatbot sites). The system is free to download for non-commercial use, and comes with teaching materials and documentation. For advice on any project, email:eliz@millican.org

MRS Connections, Barbican London, 22-24 March 2006