Detecting Deceptive Speech: Humans vs. Machines

Detecting Deceptive Speech: Humans vs. Machines Julia Hirschberg Columbia University ICASSP 2018

Co-Authors and Collaborators • Sarah ItaLevitan • Michelle Levine • Guozhen An • Andrew Rosenberg • Gideon Mendels • Rivka Levitan • Kara Schechtman • Mandi Wang • NishmarCestero • Kai-Zhan Lee • YochevedLevitan • Angel Maredia • Jessica Xiang • Bingyan Hu • Zoe Baker-Peng • Ivy Chen • Meredith Cox • Leighanne Hsu • Yvonne Missry • Gauri Narayan • Molly Scott • Jennifer Senior • James Shin • Grace Ulinksi

The Columbia Speech Lab

Deceptive Speech • Deliberate choice to mislead • Without prior notification • To gain some advantage or to avoid some penalty • Not: • Self-deception, delusion, pathological behavior • Theater • Falsehoods due to ignorance/error • Everyday (White) Lies very hard to detect • But Serious Lies may be easier to detect

Why would Serious Lies be easier to identify? • Hypotheses in research and among practitioners: • Our cognitive load is increased when lying because we … • Must keep story straight • Must remember what we have said and what we have not said • Our fear of detection is increased if… • We believe our target is difficult to fool • Stakes are high: serious rewards and/or punishments • Makes it hard for us to control potential indicators of deception

But Humans very poor at Recognizing these Cues: Aamodt & Mitchell 2004 Meta-Study ()

Current Approaches to Deception Detection • ‘Automatic’ methods (polygraph, commercial products) no better than chance • Train humans: John Reid & Associates • Behavioral Analysis: Interview/Interrogation no empirical support • Truth: I didn’t take the money vs. Lie: I did not take the money(but non-native speakersrarely use contractions so….) • Laboratory studies: Production and perception (facial expression, body posture/gesture, statement analysis, brain activation, odor,…)

Our Approach • Conduct objective, experimentally verifiedstudies of spoken cues to deceptionon large corpora whichpredict better than humans orpolygraphs • Our method: • Collect speech data and extract acoustic, prosodic, andlexical cues automatically • Take gender, ethnicity, and personality factorsinto account as features in classification • Use Machine Learningtechniques to train models to classify deceptive vs. non-deceptive speech

Questions We Hope to Answer • Can we improvehuman deception detection • By providing new knowledge and training materials • By providing classifiers to aid in deception detection • Can we identify trust in humans –and mistrust • Can we control trust in machines: an ethical question… • When robots and avatars should be trusted • When they should not…

Deception Detection from Text and Speech Corpus collection Classification Annotation, feature extraction Analysis

Outline • Corpus collection • Classification of deception from text and speech • Individual differences in deceptive behavior • New goal: Acoustic-prosodic indicators of trust

Columbia/SRI/Colorado Deception Corpus, 2003-5 • 7h of speech from 32 SAE-speaking subjects performing tasks and asked to lie about half • Lexical and acoustic-prosodic features identified from psycholinguistic literature for classification • Results • Classification accuracy (~70%) significantly better than human performanceon our corpus • Considerable individual differences between speakers: Judges with certain personality traits performed better than our classifiers

Columbia X-cultural Deception Corpus (2011--) • New questions to ask: • Can personality factors help in predicting individual differences in deception? • Can people who detect lies better also lie more successfully? • Do differences in gender and native language influence deceptive behavior? Judgment of deception? • New study: Pair native speakers of Standard American English with Mandarin Chinese speakers, speaking English, interviewing each other

Our CXD Experiment NEO-FFI Survey Lying game Background survey Biographical Questionnaire Baseline sample

Biographical Questionnaire

Our CxDExperiment NEO-FFI Survey Lying game Background survey Biographical Questionnaire Baseline sample

The Big Five NEO-FFI (Costa & McCrae, 1992) • Openness to Experience:“I have a lot of intellectual curiosity.” • Conscientiousness:“I strive for excellence in everything I do.” • Extraversion:“I like to have a lot of people around me.” • Neuroticism:“I often feel inferior to others.” • Agreeableness:“I would rather cooperate with others than compete with them.”

Our CXD Experiment

Motivation and Scoring • Monetary motivation • Success for interviewer: • Add $1 for every correct judgment, truth or lie • Lose $1 for every incorrect judgement • Success for interviewee: • Add $1 for every lie interviewer thinks is true • Lose $1 for every lie interviewers thinks is a lie • Good liars tell the truth as much as possible when lying, so how do we know what’s true or false for follow-up questions? • Interviewees press T/F keys after every phrase

Columbia X-Cultural Deception (CXD) Corpus • 340 subjects, balanced by gender and native language (American English, Mandarin Chinese):122 hours of speech • Crowdsourced transcription, speech alignment • TF keypress alignment • Segmented into • Inter-pausalunits (IPUs) • Speaker turns • Question/answer sequences (Q/A and Q/A+ follow-up)

“Where were you born?” True or False?

“Where were you born?”

“What is the most you have ever spent on a pair of shoes?” True or False?

“What is the most you have ever spent on a pair of shoes?”

Outline • Corpus collection • Classification of deception from text and speech • Individual differences in deceptive behavior • Acoustic-prosodic indicators of trust

Features We Extract • Currently: • Text-based: n-grams, psycholinguistic, Linguistic Inquiry and Word Count (LIWC) (Pennybaker et al), word embeddings (GloVe trained on 2B tweets) • Speech-based: openSMILEIS09 (386) • Gender, native language • Five personality dimensions (NEO-FFI) • Next: Syntactic features (complexity) and all combined

Corpus Segmentation • Inter-pausal unit (IPU) • Turn • Question-level(first answer, first+follow-up answers)

Machine Learning Experiments • What are the best classification models? • What are the optimal segmentation units? • Which feature sets are most useful?

Machine Learning Experiments • What are the best classification models? • Statistical machine learning – Logistic Regression, SVM, Random Forest • Neural networks – DNN, LSTM, hybrid system • What are the optimal segmentation units? • Which feature sets are most useful?

Machine Learning Experiments • What are the best classification models? • What are the optimal segmentation units? • Inter-Pausal Unit (IPU), speaker turns, Q/A, Q/A+follow-up As • Which feature sets are most useful?

Machine Learning Experiments • What are the best classification models? • What are the optimal segmentation units? • Which feature types are most useful? • Text-based: n-grams, psycholinguistic-based, LIWC, GloVe word embeddings trained on 2B tweets • Individual difference features: gender, native language, personality • Speech-based: openSMILEIS09 acoustic-prosodic features (e.g. f0, intensity, speaking rate, VQ)

Segmentation– Longer is Better: IPU, Turn, Question Mendels, Levitan et al. 2017, “Hybrid acoustic lexical deep learning approach for deception detection ”

Deep Learning Approaches • BLSTM-lexical • DNN-openSMILE • Hybrid: BLSTM-lexical + DNN-openSMILE Mendels, Levitan et al. 2017, “Hybrid acoustic lexical deep learning approach for deception detection ”

IPU Classification: Hybrid is Better Mendels, Levitan et al. 2017, “Hybrid acoustic lexical deep learning approach for deception detection ”

Question-Level Random Forest; Context is Better

Analysis – AcousticFeatures

Analysis – LexicalFeatures

Classification with Gender and Native Language: 1st Answer and Chunks: Personal Info Helps with Less Context

Results for Classification • Deception classification experiments • >.72F1-Score when using question segmentation to predict lies • Note that human interviewers’ F1 is 0.43 • Deep learning approaches for IPU segmentation probably most promising route in future • Many more features to examine together

Outline • Corpus collection • Classification of deception from text and speech • Individual differences in deceptive behavior and detection of deception • Acoustic-prosodic indicators of trust

Some Individual Differences • Extroversionis correlated with success at deception, for English male speakers • Native English speakers perform better at deception when paired with a native Chinese interviewer

Individual Differences in Deceptionand Truth-telling Levitan et al. 2018, “Linguistic indicators of deception and perceived deception in spoken dialogue”

Differences in Deceptive Ability: How well did interviewees lie?

Differences in Deception Detection: How well did interviewers judge deception?

Results • There are gender and cultural/native languagedifferences in deceptive behavior • There are differences in ability to deceive and in ability to detect deception • Understandingthese differences can improve deception classification by machines and perhaps by humans…

Detecting Deceptive Speech: Humans vs. Machines

Detecting Deceptive Speech: Humans vs. Machines

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7