1 / 7

RST Discourse Corpus

RST Discourse Corpus. Lynn Carlson Daniel Marcu Mary Ellen Okurowski. Theory and Phenomena Annotated. Framework for annotation: Rhetorical Structure Theory (Mann and Thompson, 1988) Discourse structure of text represented as tree defined as follows:

shae
Télécharger la présentation

RST Discourse Corpus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RST Discourse Corpus Lynn Carlson Daniel Marcu Mary Ellen Okurowski

  2. Theory and Phenomena Annotated • Framework for annotation: • Rhetorical Structure Theory (Mann and Thompson, 1988) • Discourse structure of text represented as tree defined as follows: • Leaves: text fragments (mainly clauses) that represent elementary discourse units • Internal nodes of tree = contiguous text spans • Each node characterized by nuclearity: nucleus = more essential unit; satellite = supporting unit • Each node characterized by rhetorical relation that holds between two or more non-overlapping, adjacent spans

  3. Sample Text with EDUs: wsj_1111 [Still, analysts don’t expect the buy-back to significantly affect per-share earnings in the short term.]16 [“The impact won’t be that great,”]17 [said Graeme Lidgerwood of First Boston Corp.]18 [This is in part because of the effect]19 [of having to average the number of shares outstanding,]20 [she said.]21 [In addition,]22 [Mrs. Lidgerwood said,]23 [Norfolk is likely to draw down its cash initially]24 [to finance the purchases]25 [and thus forfeit some interest income.]26 wsj_1111

  4. 22-25 example elaboration-additional (16) 17-26 17-21 explanation-argumentative 22-26 consequence-s 19-21 (26) 17-18 attribution attribution same-unit 19-20 (17) (18) (21) 22-23 24-25 * + purpose (20) (19) (22) (23) (24) (25) +attribution embedded *elaboration-object-attribute-embedded Sample Discourse Tree: wsj_1111

  5. Nature of Rhetorical Relations • Attribution: attribution, attribution-negative • Background: background, circumstance • Cause: cause, result, consequence • Comparison: comparison, preference, analogy, proportion • Condition: condition, hypothetical, contingency, otherwise • Contrast: contrast, concession, antithesis • Elaboration: elaboration-additional, elaboration-general-specific • Enablement: purpose, enablement • Evaluation: evaluation, interpretation, conclusion, comment • Explanation: evidence, explanation-argumentative, reason • Joint: list, disjunction • Manner-Means: manner, means • Topic-Comment: problem-solution, question-answer, topic-comment.. • Summary: summary, restatement • Temporal: temporal-before, temporal-after, temporal-same-time.. • Topic Change: topic-shift, topic-drift

  6. Corpus and Annotation Details • 385 WSJ articles from Penn Treebank, representing over 176,000 words of text. ~14% were double-tagged. • Document length: 31 to 2124 words; average of 458.14 words • Average # EDUs per document: 56.59. • Average # words per EDU: 8.1. • Nature of articles: general news, financial, business, cultural reviews, editorials • Intended users: developers of automatic text processing systems

  7. Inter-Annotator Agreement

More Related