250 likes | 362 Vues
Discourse Structure in Generation. Julia Hirschberg CS 4706. Today. Models of Discourse Structure Do we have them? Grosz & Sidner ’86 What identifies discourse structure to Hearers? Textual cues Spoken cues How can we produce appropriate discourse structure in TTS systems?
E N D
Discourse Structure in Generation Julia Hirschberg CS 4706
Today • Models of Discourse Structure • Do we have them? • Grosz & Sidner ’86 • What identifies discourse structure to Hearers? • Textual cues • Spoken cues • How can we produce appropriate discourse structure in TTS systems? • Can we identify discourse structure automatically, from speech?
Is there structure in this discourse? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.
Is this a reasonable structure? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.
This? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.
This? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.
What information do we use in segmenting a discourse? • ‘Topic’ coherence? • Repeated reference? • ‘Cue’ phrases? • ????
Structures of Discourse Structure (Grosz & Sidner ‘86) • A leading theory of discourse structure • Based upon Speaker intentions and Speaker and Hearer attentional state • Identifies a few, general relations that hold among Speaker intentions • Identifies a model of attentional state • Three components: • Linguistic structure • Intentional structure • Attentional structure
Linguistic Structure • What is actually said or written • How is the linguistic structure represented? • Assume discourse is segmented into Discourse Segments (DS) • What is the basic unit of analysis? • Do we all segment alike? • Do we all use the same cues?
Linguistic Structure of Discourse D S1: A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. S2: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.
Intentional Structure • Discourse purpose (DP): basic purpose of the Speaker in producing the discourse • Discourse segment purposes (DSPs): the Speaker’s purpose in producing the segment • Segments are related to one another by their purposes: • Satisfaction-precedence: DSP1 must be satisfied before DSP2 • Dominance: DSP1 dominates DSP2 if fulfilling DSP2 constitutes part of fulfilling DSP1
Linguistic Structure of Discourse D DSP1: Describe murder of dove by duck. S1: A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. DSP2: Describe meeting of old friend. S2: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.
DSP2: Describe recovery process. S2: DSP3: Describe snack S3: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. DSP3: Describe meeting old friend. S4: To my surprise, I ran into a friend from back home. DSP5: Describe friend’s reaction S5: When I told her of my recent experience she questioned my sanity.
Attentional State: The Focus Stack • Stack of focus spaces, each containing objects, properties and relations salient during each DS, plus the DSP • State changes: transition rules controlling the addition/deletion of focus spaces • Information at lower levels may or may not be available at higher levels • Focus spaces are pushed onto the stack when • A new DS is begun
An embedded DS (e.g. a DS dominated by another DS) is begun • Focus spaces are popped when they are completed • State of focus stack models felicitous reference, coherence in discourse S2: DSP2, scene, Speaker, snack_bar Cocoa, friend, home,sanity S1: DSP1, duck, dove, Speaker, duck_dove_supply
Limits of the Theory • Assumes discourses are task-oriented • Assumes a single, hierarchical structure shared by S and H • Questions: • Do people really build such structures when they converse? • Use them in interpreting what others say? • How could they do it?
How might people recognize discourse structure? • Linguistic markers? • tense and aspect • cue phrases • Inference of Speaker intentions? • Inference from task structure? • Intonational Information?
Acoustic and Prosodic Cues to Discourse Structure • Intuition: • Speakers vary acoustic and prosodic cues to convey variation in discourse structure • Systematic? In read or spontaneous speech? • Evidence: • Observations from recorded corpora • Laboratory experiments • Machine learning of discourse structure from acoustic/prosodic features
Prosodic Correlates of Discourse/Topic Structure • Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 • Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96
Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92
Issues • Do we find significant and reliable cues to discourse structure in prosodic variation • When tested against an independent theory of discourse structure? • In spontaneous as well as read speech? • Are Hearers interpretations of discourse structure influenced by intonational variation?
Grosz & Hirschberg ‘92 • Small corpus of read AP newswire • Read by professional speaker • Labeled for discourse structure from text alone or from text and speech • Pre-ToBI labeled • Acoustic-prosodic features extracted for each intermediate (level 3) phrase • Pitch range and change from prior phrase • Intensity (rms) and change in db from prior phrase • Preceding and subsequent pause • Speaking rate
Analysis of phrases in different segment positions: SBEG, SF, parentheticals, quoted speech • ANOVA’s and t-tests on means • Results: • Direct quotes: larger pitch range • Parentheticals: smaller range, neg change from prior phrase, neg change in db, faster rate • SBEG: larger range, louder, greater preceding pause, less subsequent pause • SF: greater subsequent pause
Machine learning experiments identified: • SBEG with 91.5% est. accuracy (x-validation) • SF, 92.5% • Attributive tags, 96.9% • Direct quotations, 86.4% • Indirect quotations, 88.5% • Parentheticals, 89.2% • Conclusion: Acoustic/prosodic information is available to permit Hearers to identify discourse structure…
Next • The midterm • Closed book, no notes or electronic devices • Will include material through today