1 / 17

Amy Neustein, Ph.D. Linguistic Technology Systems lingtec@banet

Second Annual Research Symposium of the Human Language Technology Research Institute Sequence Package Analysis: A New Natural Language Intelligence Method for Speeding Up Wiretap Analysis. Amy Neustein, Ph.D. Linguistic Technology Systems lingtec@banet.net.

deborahtodd
Télécharger la présentation

Amy Neustein, Ph.D. Linguistic Technology Systems lingtec@banet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Second Annual Research Symposium of the Human Language Technology Research InstituteSequence Package Analysis:A New Natural Language Intelligence Method for Speeding Up Wiretap Analysis Amy Neustein, Ph.D. Linguistic Technology Systems lingtec@banet.net

  2. Sequence Package Analysis: A New Natural Language Intelligence Method to Speed Up Wiretap Analysis Question: Why do we need a new method? Answer: 1) In the real world speakers do not always use “key” words that can be spotted in a dialog. 2) In crime or terror related dialog, speakers will deliberately avoid the use of key words that can identify names, places, dates, etc.

  3. Sequence Package AnalysisHow Does SPA Work? 1) Add rather than Replace SPA adds a layer of intelligence to standard dialog systems. 2) Mines audio data SPA goes beyond a conventional search for words and word strings. 3) Examines a series of related speaking turnsthat are discretely packaged as a sequence.

  4. What is the Methodological Basis of Sequence Package Analysis?SPA is derived from the Conversation Analytic method of breaking down natural language communication into its primary units of analysis, sequences and turns within sequences, rather than isolated sentences or utterances.Conversation Analysis has been called by some a sub fieldof A.I. because it can detect the detailed structural organization of dialog, which is critical in designing simulacra that emulate how humans really talk.

  5. What Makes “Sequence Package Analysis” Unique? 1. Sequence Packages go beyond standard grammatical formalisms (e.g., Q/A sequences) to include large, winding episodes of talk. 2. Sequence Packages are dialogically indigenous and thus must be detected by algorithms that are sophisticated and flexible enough to chart out these dialog patterns.

  6. SPA: A Unique Method but with a Conventional Design ApproachSLM(Statistical Language Modeling)can beapplied to SPA. Here’s how:This extra layer of intelligence, that SPA provides a dialog system, contains “sequence package”formulae that give the beststatisticalmatch for these sequence package entries just as how word entries are matched against their own statistical approximations.

  7. WHAT DOES SPA DO? 1) SPA permits the discovery of “key” words (e.g., the name of a location where a crucial meeting among terrorists will take place) that are not in the preset lexicon. 2) SPA permits rapid and efficient data mining of large volumes of audio text by spotting sequence packages in the dialog.

  8. ADVANTAGES OF SPA Can be applied to different languages: works by identifying interactional features of dialog (conversational sequence patterns) rather a preset glossary of words. Can perform data mining in realtime: permits a human analyst to be brought in immediately when high alarm content is being produced in the dialog.

  9. DEMONSTRATION The following example shows how applying an SPA approach to wiretapped dialog can flag important security information that is cleverly disguised by the suspects:

  10. Speaker “A” is trying to educate Speaker “B” about a new meeting place whose location is very important. Any confusion or misunderstanding about this meeting place could spoil the plans. But Speaker “A” is very clever: First, he stays away from buzz words (such as naming a bridge, a tunnel or a street).Second, he refrains from making any prefatory remarks about the importance of meeting at this new location, and not confusing it with another place.

  11. Dialog Example Speaker “A”: Come to the intersection near Juniors? (the question mark shows an upward intonation) 0.2 - 0.5 second pause (speaker then pauses briefly) Speaker “B”: 1.2 second pause Speaker “A”: You know the thoroughfare with the big traffic light? Speaker “B”: Juniors, yeah.

  12. THE SEQUENCE PACKAGE Speaker “A”: Come to the intersection near Juniors? 0.2-0.5 Speaker “B”: 1.2 seconds of silence • A noun referent (“Juniors”) with an upward intonation • A brief pause, giving the listener the chance to show recognition or ask for clarification. • Silence by the listener which indicates lack of understanding or confusion.

  13. Speaker “A”: You know the thoroughfare with the big traffic light?Speaker “B”: Juniors, yeah. • Clarification of the noun referent (“You know the thoroughfare with...”) • Repeat of noun referent (“Juniors”) - the source of the recognition trouble - followed by a recognitional marker (“Yeah”).

  14. Finding the Sequence Package in the Dialog Example Look for a concatenation of these utterance components: • noun referent with upward intonation • brief pause • silence • clarification of noun referent • repeat of noun referent that was initial source of the recognition trouble • recognitional marker

  15. ADVANTAGES OF SPA Can be applied to different languages works by identifying interactional features of dialog (conversational sequence patterns) rather a preset glossary of words. Can perform data mining in realtime permits a human analyst to be brought in immediately when high alarm content is being produced in the dialog.

  16. Private Industry Applications for SPA • Technical Support Centers • Help desks • Call Centers • Broadcast Media • Depositions • Courtroom Testimony • Corporate Communications • Conference Managers

  17. Second Annual Research Symposium of the Human Language Technology Research InstituteSequence Package Analysis:A New Natural Language Intelligence Method for Speeding Up Wiretap Analysis Amy Neustein, Ph.D. Linguistic Technology Systems lingtec@banet.net

More Related