Recognizing Structure: Dialogue Acts and Segmentation

Recognizing Structure: Dialogue Acts and Segmentation Julia Hirschberg CS 6998

Today • Recognizing structural information from speech • Topic structure • Speech/dialogue acts • Applications • Speech browsing and search of large corpora • Broadcast News (NIST TREC SDR track) • Topic Detection and Tracking (NIST/DARPA TDT) • Customer care, focus groups, voicemail • Spoken Dialogue Systems

SCAN

SCANMail Demo: Basic Layout

SCANMail Demo: Number Extraction

Discourse Structure and Topic Structure • Intention-based accounts • Grosz & Sidner ‘86 • Conversational moves (games) • Edinburgh map task dialogues • Adjacency pairs • Schegloff, Sacks, Jefferson

Indicators of Topic Structure • Cue phrases: now, well, first • Pronominal reference • Orthography and formatting -- in text • Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99) • In speech?

Prosodic Correlates of Discourse/Topic Structure • Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 • Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96

Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92

Prosodic Cues to Sentence and Topic Boundaries: Shriberg et al ’00 • Prosody cues perform as well or better than text-based cues at topic segmentation -- and generalize better? • Goal: identify sentence and topic boundaries at ASR-defined word boundaries • CART decision trees provided boundary predictions • HMM combined these with lexical boundary predictions

Features • For each potential boundary location: • Pause at boundary (raw and normalized by speaker) • Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) • Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) • F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity

Voice quality (halving/doubling estimates as correlates of creak or glottalization) • Speaker change, time from start of turn, # turns in conversation and gender • Trained/tested on Switchboard and Broadcast News

Sentence segmentation results • Prosodic features • Better than LM for BN • Worse (on transcription) and same for ASR transcript on SB • All better than chance • Useful features for BN • Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration

Useful features for SB • Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn

Topic segmentation results (BN only): • Useful features • Pause at boundary, f0 range, turn/no turn, gender, time in turn • Prosody alone better than LM • Combined model improves significantly

Speech Act Theory • John Searle • Locutionary acts: semantic meaning • Illocutionary acts: ask, promise, answer, threat • Perlocutionary acts: Effect intended to be produced on speaker: regret, fear • Dialogue acts • Many tagging schemes (e.g. DAMSL)

Practical Motivations: Spoken Dialogue Systems • Add more information about speaker intentions • Disambiguate ambiguous utterances • Okay • Um • Right

Experimental Evidence: Nickerson & Chu-Carroll ‘99 • Can/would/would..willing questions • Can you move the piano? • Would you move the piano? • Would you be willing to move the piano? • A la Sag & Liberman ‘75: can intonation disambiguate?

Experiments • Production studies: • Subjects read ambiguous questions in disambiguating contexts • Control for given/new and contrastiveness • Polite/neutral/impolite • Problems: • Cells imbalanced • No pretesting

No distractors • Same speaker reads both contexts

Results • Indirect requests • If L%, more likely (73%) to be indirect • 46% H%: differences in height of boundary tone? • Politeness: can differs in impolite (higher rise) vs. neutral • Variation in speaker strategy

Corpus Studies: Jurafsky et al ‘98 • Lexical, acoustic/prosodic/syntactic differentiators for yeah, ok, uhuh, mhmm, um… • Continuers: Mhmm (not taking floor) • Assessments: Mhmm (tasty) • Agreements: Mhmm (I agree) • Yes answers: Mhmm (That’s right) • Incipient speakership: Mhmm (taking floor)

Corpus Study • Switchboard telephone conversation corpus • Hand segmented and labeled with DA information (initially from text) • Relabeled for this study • Analyzed for • Lexical realization • F0 and rms features • Syntactic patterns

Results: Lexical Differences • Agreements • yeah (36%), right (11%),... • Continuer • uhuh (45%), yeah (27%),… • Incipient speaker • yeah (59%), uhuh (17%), right (7%),… • Yes-answer • yeah (56%), yes (17%), uhuh (14%),...

Results: Prosodic and Syntactic Cues • Relabeling from speech produces only 2% changed labels over all (114/5757) • 43/987 continuers --> agreements • Why? • Shorter duration, lower F0, lower energy, longer preceding pause • Over all DA’s, duration best differentiator but… • Highly correlated with length in words • Assessments: That’s X (good, great, fine,…)

Future Work • Speaker differences? • Higher level prosodic differences among ambiguous word DA’s?

A Coding Scheme for ‘ok’ • Ritualistic? • Closing • You're Welcome • Other • No • 3rd-Turn-Receipt? • Yes • No • If Ritualistic==No, code all of these as well:

Task Management: • I'm done • I'm not done yet • None • Topic Management: • Starting new topic • Finished old topic • Pivot: finishing and starting • Turn Management: • Still your turn (=traditional backchannel) • Still my turn (=stalling for time) • I'm done, it is now your turn • None • Belief Management: • I accept your proposition • I entertain your proposition • I reject your proposition • Do you accept my proposition? (=y/n question) • None

Next Week • Turn-taking and disfluencies

Recognizing Structure: Dialogue Acts and Segmentation