YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTEUsing a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein, Ph.D. Linguistic Technology Systems email@example.com
WHY DO WE NEED A NEW NATURAL LANGUAGE INTELLIGENCE METHOD FOR MINING WIRETAP RECORDINGS? 1) The volume of terrorism-related government wiretap recordings far exceeds the intelligent agent’s human capabilities to mine those recordings; and 2) Most automated audio data mining programs have a low rate of return when searching for “keywords” in wiretap recordings because terror suspects will deliberately avoidthe use of key words that can identify names, places, dates, etc.
Sequence Package Analysis--A New Method of Natural Language Intelligence HOW DOES SPA WORK? 1) Add rather than Replace SPA adds a layer of intelligence to standard dialog systems. 2) Mines audio data SPA goes beyond a conventional search for words and word strings. Identifies a Series of Related Speaking Turns and Turn Construction Units (parts of turns)that are Discretely Packaged as a Sequence of Conversational Interaction
WHAT IS THE METHODOLOGICAL BASIS OF SPA?SPA is a new natural language understanding method, which has been successfully peer reviewed and cited by other researchers as an important data mining method for captioning text, that draws mainly from the field of conversation analysis: the study of the orderly properties of interactive dialog that revolve around the turn-taking system process and other sequentially based features that are part of that process.Conversation Analysis has been called by some a sub fieldof A.I. because it can detect the detailed structural organization of dialog which is a necessary precondition for the design of dialog systems that simulate and understand human dialog.
WHAT DOES SPA DO? 1) SPA permits the discovery of “key” words (e.g., the name of a location where a crucial meeting among terrorists will take place) that are not contained in the speech application’s vocabulary. 2) SPA permits rapid and efficient data mining of large volumes of audio text by spotting sequence packages in the dialog.
MINING THE DATA FOR SEQUENCE PACKAGES A sudden increase in the speakers’ use of pronouns in place of noun referents may indicate the speakers are going over familiar or well rehearsed subject matter. The unexpected increased use of adjectival descriptors, serving as a kind of privately shared “shorthand” label to describe a person or enemy target, in the place of nouns can flag terrorist plans and activities. SPA, by looking for sequence patterns, can locate these descriptors even when they are outside of the speech application’s vocabulary.
ADVANTAGES OF SPA • SPA captures the predictablepatterns of human dialog, while all other methods depend on spotting isolated key words or phrases, which can vary from speaker to speaker; • Can be applied to different languages because it works by identifying conversational sequence patterns - which cut to the heart of the social architecture of language-- rather than identify a preset glossary of words; and • Has the potential of performing data mining in real time, allowing a human analyst to act on the spot when hearing high alarm content.
DEMONSTRATION The following example shows how applying an SPA approach to wiretapped dialog can flag important security information that is cleverly disguised by the suspects:
Speaker “A” is trying to educate Speaker “B” about a new meeting place right at the tip of the Brooklyn Bridge. Any confusion or misunderstanding about this meeting place could spoil the plans.But Speaker “A” is very clever:First, he stays away from buzz words (such as naming a bridge, a tunnel or a street).Second, he refrains from making any prefatory remarks or comments to the other speaker about how vital it is to get these instructions right.
Dialog Example Speaker “A”: Come to the intersection near Juniors? (the question mark shows an upward intonation) 0.2 - 0.5 second pause (speaker then pauses briefly) Speaker “B”: 1.2 second pause Speaker “A”: You know the thoroughfare with the big traffic light? Speaker “B”: Juniors, yeah.
THE SEQUENCE PACKAGE Speaker “A”: Come to the intersection near Juniors? 0.2-0.5 Speaker “B”: 1.2 seconds of silence • A noun referent (“Juniors”) with an upward intonation • A brief pause, giving the listener the chance to show recognition or ask for clarification. • Silence by the listener which indicates lack of understanding or confusion.
Speaker “A”: You know the thoroughfare with the big traffic light?Speaker “B”: Juniors, yeah. • Speaker “A” produces aclarification of the noun referent (“Juniors”) (“You know the thoroughfare with...”) • Speaker “B” produces arepeat of the noun referent (“Juniors”) - the source of the recognition trouble • followed by a recognitional marker (“Yeah”)--which demonstrates to Speaker “A” that he has corrected the misunderstanding. • Had he simply produced a recognitional marker (“yeah”) without mentioning the source of the trouble (“Juniors”), there would be no indication to the other speaker that he now recognizes the importance of the meeting place.
Finding the Sequence Package in the Dialog Example Look for a concatenation of these utterance components: • noun referent with upward intonation • brief pause • silence • clarification of noun referent • repeat of noun referent that was initial source of the recognition trouble • recognitional marker
CODA The next step is the validation of SPA as a necessary tool for performing wiretap analysis Research Question: Do mining programs have a higher rate of accuracy in spotting terrorists when adding Sequence Package Analysis as a new method of natural language intelligence for performing wiretap analysis?