Testing Language-Based Indicators of Deception On a Corpus of Legal Narratives

Eileen Fitzpatrick Montclair State University Montclair NJ Joan Bachenko Deception Discovery Technologies Minneapolis, MN Testing Language-Based Indicators of Deception On a Corpus of Legal Narratives

Deception • A deliberate attempt to mislead (excludes acting) • An attempt to mislead over a serious matter (excludes social lies) • For our work: the subject is not in control of the situation (e.g. interviews, court testimony)

The Significant Literatures on Deception Two areas with a significant literature on deception • Psychology: experimental; no high stakes Asks general questions about differences in behavior when lying and telling the truth • Law enforcement: high stakes; no quantitative evaluation Asks whether criminals can be caught through their deceptive behavior

The Basic Assumption of the Literature When people lie, they exhibit behavioral differences that they cannot control. • All liars “leak” cues to the deceptive state of their message. • Linguistic cues cited in the literature include hedges, negative statements, verb tense changes, etc. Sociopathic liars are excluded from the above.

The Question • Can we empirically verify the effectiveness of linguistic cues to identify deception in real world data? • To test this we need: • A linguistically motivated model • A corpus of real world testimony • A method of verifying deception

Talk Outline • The Model • The Corpus • Ground Truth Verification • Test Results

The Model includes: • Discrete, formal indicators of deception that we can implement in an automatic system • An algorithm that identifies clusters of deception indicators

Formal Indicators of Deception • 12 indicators, fall into three classes: • lack of commitment to a proposition • preference for negative expressions in word choice and sentence structure • inconsistencies with respect to verb and noun forms

Lack of Commitment: Hedges • Scott Peterson: I assumed she was at her Mom's • Tobacco trial: I didn't do work specifically on teenage smoking. I was aware that at one time Philip Morris did interview people under the age of 18. • Car Break-in: I said somebody probably breaking into the damn car.

Lack of Commitment: Qualified Assertion Leaves open whether an act was performed • Chappaquiddick: I attempted to open the door and the window of the car • Guilty Nurse: I believe the first fire was between 0930-1000. I know I did need to retrieve my inhaler from my car. • Deer Hunter: I tried to give her artificial respiration.

Deception Indicators: Negative Expressions • Simple negatives: • Police interrogation: • But I never pulled the knife out. • Negative emotions • Susan Smith • I was very emotionally distraught • Overzealous Expressions • Tobacco Trial • I haveabsolutely norecollection

Deception Indicators: Verb Tense Change • Susan Smith: Initial Police Interview SMITH: I just feel hopeless. I can't do enough. My children wanted me. They needed me. And now I can't help them. I just feel like such a failure."

Deception Indicators: Changes in Referring Expressions Jeffrey McDonald: initial police interview McDonald: I went back to check my wife. I then went to check the kids . . . . McDonald: So I told him that I needed a doctor and an ambulance and that some people had been stabbed. Scott Peterson: initial police interview Brocchini: You drive straight home? Peterson: To the warehouse, ___ dropped off the boat. (Peterson’s replies drop the first person reference when describing events surrounding the murder.)

Deception Indicator Clustering Deception indicators occur in everyone’s language whether their statements are truthful, deceptive, or neutral. The Hypothesis: Assumes that when people are being deceptive, the number of indicators will rise. The algorithm assigns a moving average to each word depending on its proximity to an indicator. Summed over a proposition, it gives the likelihood of deception for the proposition.

Clustering Ex.: T is blue, F is red On July 18th, 1969, at approximately 11:15 PM in Chappaquiddick, Martha’s Vineyard, Massachusetts, I was driving my car on Main Street on my way to get the ferry back to Edgartown. I was unfamiliar with the road and turned right onto Dike Road, instead of bearing hard left on Main Street. After proceeding for approximately one-half mile on Dike road I descended a hill and came upon a narrow bridge. The car went off the side of the bridge. There was one passenger with me, a Miss _____, a former secretary of my brother Sen. Robert Kennedy. The car turned over and sank into the water and landed with the roof resting on the bottom. I attempted to open the door and the window of the car but have no recollection of how I got out of the car. I came to the surface and then repeatedly dove down to the car in an attempt to see if the passenger was still in the car. I was unsuccessful in the attempt. I recall walking back to where my friends were eating. There was a car parked in front of the cottage and I climbed into the backseat. I then asked for someone to bring me back to Edgartown. I remember walking around for a period then going back to my hotel room.

The Corpus 30,425 words from 13 speakers Police interviews Legal depositions Congressional testimony Criminal statements Annotated for deception indicators

Ground Truth Verification Two sources Externally verified Police reports, court reports, videotapes, financial records Internally verified through consistency of the narrative All right, man, I did it!

Testing the Hypothesis • Testing Subcorpus: Statements from 3 criminal and 2 civil cases • 277 propositions • annotated for deception indicators • annotated for ground truth 165 False; 112 True

Testing the Cues (no clustering) CART Classification matrix based on 10-fold Cross Validation* predicted class actual class F T Total F 110 55 165 (59.5%) T 24 88 112 (40.5%) Result: 71.5% correct 67% of F’s are found *We used QUEST. Loh and Shih (1997) for the modeling

Testing the Added Value of Cue Clustering Classification matrix based on 10-fold Cross Validation Predicted class F T Actual class F 154 11 165 T 73 39 112 Result: 69.7% correct 93% of F’s are found

Conclusion and Future Research Language-based deception cues predict deception above the system baseline of 59.5% Examination of deception bias is underway Cue clustering contributes measurably to predicting deception Hedges and negative elements far outweigh other cues; we need to expand the corpus to test the value of the other cues

References, 1 Adams, S. 2002. Communication under stress: indicators of veracity and deception in written narratives. Ph.D. dissertation, Virginia Polytechnic Institute and State University DePaulo, B. M., J.J. Lindsay, B.E. Malone, L. Muhlenbruck, K. Charlton, and H. Cooper. 2003. Cues to deception. Psychological Bulletin, 129(1), 74-118. Dulaney, E.F. Jr. 1982. Changes in language behavior as a function of veracity. Human Communication Research 9, 75-82. Knapp, M.L., Hart, R.P., and Dennis, H.S. 1974. An exploration of deception as a communication construct. Human Communication Research, 1, 15-29. Miller, G. R. and J. B. Stiff. 1993. Deceptive Communication. Sage Publications , Thousand Oaks, CA. Newman, M. L., Pennebaker, J. W., Berry, D. S. and J. M. Richards. 2003. Lying words: predicting deception from linguistic styles. Personality and Social Psychology Bulletin. 29, 665-675.

References, 2 Porter, S. & Yuille, J. (1996). The language of deceit: An investigation of the verbal clues in the interrogation context. Law & Human Behavior, 20(4) 443-458. Sapir, A. 1987. Scientific Content Analysis (SCAN). Laboratory of Scientific Interrogation. Phoenix, AZ. Shuy, R. 1998. The Language of Confession, Interrogation and Deception. Sage Publications, Thousand Oaks, CA. Smith, N. 2001. Reading between the lines: An evaluation of the scientific content analysis technique (SCAN). Police Research Series. London,UK. www.homeoffice.gov.uk/rds/prgpdfs/prs135.pdf Vrij, A. 2000. Detecting Lies and Deceit. John Wiley & Sons, Chichester, UK.

Distribution of Cues in Data (1) Cue Type No. No. Correlated with False No. of Spkrs using cue Negative form 86 78 5 Hedge 65 52 6 Verb tense change 15 14 4 Noun Phrase change 12 11 5 Pronoun change 12 9 3 Thematic role change 10 5 3

Distribution of Cues in Data (2) Cue Type No. No. Correlated with False No. of Spkrs using cue Memory loss 9 9 3 Overzealous expr 8 7 3 Time loss 6 5 4 Questionable action 11 9 6 Qualification 5 5 3 Negative emotion 0 0 0

Proximity Scoring and DI Density Proximity score—how close a word is to the nearest DI, measured in word count Moving Average—for a user-selected number N, recalculate proximity scores to highlight DI clusters In the example, N=8 If N is even , then the moving average window is: N words to the left, N-2 words to the right 2 2

Proximity Scoring Example

Automating the Analysis DI clustering is 100% implemented DI tag assignment is 70-80% automated Goals: POS assignment (done) Partial parsing: phrase chunks (in progress) DI tagging (95%) Semi-supervised adaptation (in progress)

DI Autotagging Simple substitution maybe --> {maybe%HDG} Local context about --> {about%HDG}/__number POS may_MD --> {may%HDG} Syntactic Structure needed to retrieve --> [VP needed [VP to retrieve]] --> {needed to retrieve%QA}

Current Work Goal is commercial deployment Legal, insurance and HR markets Form of the product Pricing plans Goal is technical soundness Research Development Testing

Testing Language-Based Indicators of Deception On a Corpus of Legal Narratives

Testing Language-Based Indicators of Deception On a Corpus of Legal Narratives

Presentation Transcript

Corpus-based language education in Chinese context

Extraction of Legal Definitions from a Japanese Statutory Corpus -- Toward Construction of a Legal Term Ontology

Social Indicators of Deception

A Theory of Fault Based Testing

Narratives of Violence

ELEMENTS OF NARRATIVES

State of Deception

Narratives about the truth of narratives

Uses of a Corpus

Corpus-based Semantics of Concession

Functions of Legal Language

Characteristics of Legal Language

Heritage language learning: A corpus-based inquiry

A Corpus Based Computational Linguistics

Elements of Narratives

Evaluation of Corpus based Synthesizers

Discriminating CEFR levels in Greek L2: a corpus-based study of young learners’ written narratives

A Corpus-based Study of Connectors: Research from the CAS Learner Corpus of English Essays

Detection of Deception

Pragmatics of deception

Discriminating CEFR levels in Greek L2: a corpus-based study of young learners’ written narratives

Definition of a corpus