1 / 29

Recognizing Structure: Dialogue Acts and Segmentation

Recognizing Structure: Dialogue Acts and Segmentation. Julia Hirschberg CS 6998. Today. Recognizing structural information from speech Topic structure Speech/dialogue acts Applications Speech browsing and search of large corpora Broadcast News (NIST TREC SDR track)

wqualls
Télécharger la présentation

Recognizing Structure: Dialogue Acts and Segmentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognizing Structure: Dialogue Acts and Segmentation Julia Hirschberg CS 6998

  2. Today • Recognizing structural information from speech • Topic structure • Speech/dialogue acts • Applications • Speech browsing and search of large corpora • Broadcast News (NIST TREC SDR track) • Topic Detection and Tracking (NIST/DARPA TDT) • Customer care, focus groups, voicemail • Spoken Dialogue Systems

  3. SCAN

  4. SCANMail Demo: Basic Layout

  5. SCANMail Demo: Number Extraction

  6. Discourse Structure and Topic Structure • Intention-based accounts • Grosz & Sidner ‘86 • Conversational moves (games) • Edinburgh map task dialogues • Adjacency pairs • Schegloff, Sacks, Jefferson

  7. Indicators of Topic Structure • Cue phrases: now, well, first • Pronominal reference • Orthography and formatting -- in text • Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99) • In speech?

  8. Prosodic Correlates of Discourse/Topic Structure • Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 • Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96

  9. Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92

  10. Prosodic Cues to Sentence and Topic Boundaries: Shriberg et al ’00 • Prosody cues perform as well or better than text-based cues at topic segmentation -- and generalize better? • Goal: identify sentence and topic boundaries at ASR-defined word boundaries • CART decision trees provided boundary predictions • HMM combined these with lexical boundary predictions

  11. Features • For each potential boundary location: • Pause at boundary (raw and normalized by speaker) • Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) • Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) • F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity

  12. Voice quality (halving/doubling estimates as correlates of creak or glottalization) • Speaker change, time from start of turn, # turns in conversation and gender • Trained/tested on Switchboard and Broadcast News

  13. Sentence segmentation results • Prosodic features • Better than LM for BN • Worse (on transcription) and same for ASR transcript on SB • All better than chance • Useful features for BN • Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration

  14. Useful features for SB • Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn

  15. Topic segmentation results (BN only): • Useful features • Pause at boundary, f0 range, turn/no turn, gender, time in turn • Prosody alone better than LM • Combined model improves significantly

  16. Speech Act Theory • John Searle • Locutionary acts: semantic meaning • Illocutionary acts: ask, promise, answer, threat • Perlocutionary acts: Effect intended to be produced on speaker: regret, fear • Dialogue acts • Many tagging schemes (e.g. DAMSL)

  17. Practical Motivations: Spoken Dialogue Systems • Add more information about speaker intentions • Disambiguate ambiguous utterances • Okay • Um • Right

  18. Experimental Evidence: Nickerson & Chu-Carroll ‘99 • Can/would/would..willing questions • Can you move the piano? • Would you move the piano? • Would you be willing to move the piano? • A la Sag & Liberman ‘75: can intonation disambiguate?

  19. Experiments • Production studies: • Subjects read ambiguous questions in disambiguating contexts • Control for given/new and contrastiveness • Polite/neutral/impolite • Problems: • Cells imbalanced • No pretesting

  20. No distractors • Same speaker reads both contexts

  21. Results • Indirect requests • If L%, more likely (73%) to be indirect • 46% H%: differences in height of boundary tone? • Politeness: can differs in impolite (higher rise) vs. neutral • Variation in speaker strategy

  22. Corpus Studies: Jurafsky et al ‘98 • Lexical, acoustic/prosodic/syntactic differentiators for yeah, ok, uhuh, mhmm, um… • Continuers: Mhmm (not taking floor) • Assessments: Mhmm (tasty) • Agreements: Mhmm (I agree) • Yes answers: Mhmm (That’s right) • Incipient speakership: Mhmm (taking floor)

  23. Corpus Study • Switchboard telephone conversation corpus • Hand segmented and labeled with DA information (initially from text) • Relabeled for this study • Analyzed for • Lexical realization • F0 and rms features • Syntactic patterns

  24. Results: Lexical Differences • Agreements • yeah (36%), right (11%),... • Continuer • uhuh (45%), yeah (27%),… • Incipient speaker • yeah (59%), uhuh (17%), right (7%),… • Yes-answer • yeah (56%), yes (17%), uhuh (14%),...

  25. Results: Prosodic and Syntactic Cues • Relabeling from speech produces only 2% changed labels over all (114/5757) • 43/987 continuers --> agreements • Why? • Shorter duration, lower F0, lower energy, longer preceding pause • Over all DA’s, duration best differentiator but… • Highly correlated with length in words • Assessments: That’s X (good, great, fine,…)

  26. Future Work • Speaker differences? • Higher level prosodic differences among ambiguous word DA’s?

  27. A Coding Scheme for ‘ok’ • Ritualistic? • Closing • You're Welcome • Other • No • 3rd-Turn-Receipt? • Yes • No • If Ritualistic==No, code all of these as well:

  28. Task Management: • I'm done • I'm not done yet • None • Topic Management: • Starting new topic • Finished old topic • Pivot: finishing and starting • Turn Management: • Still your turn (=traditional backchannel) • Still my turn (=stalling for time) • I'm done, it is now your turn • None • Belief Management: • I accept your proposition • I entertain your proposition • I reject your proposition • Do you accept my proposition? (=y/n question) • None

  29. Next Week • Turn-taking and disfluencies

More Related