1 / 27

Supervised Categorization for Habitual versus Episodic Sentences

Supervised Categorization for Habitual versus Episodic Sentences . Thomas Mathew tam52@georgetown.edu Graham Katz egk7@georgetown.edu Department of Linguistics Georgetown University. Introduction. Habitual sentences state general facts Describe properties of a class

kyoko
Télécharger la présentation

Supervised Categorization for Habitual versus Episodic Sentences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supervised Categorization for Habitual versus Episodic Sentences Thomas Mathew tam52@georgetown.edu Graham Katz egk7@georgetown.edu Department of Linguistics Georgetown University

  2. Introduction • Habitual sentences state general facts • Describe properties of a class Bears eat blackberries • Characteristic of specific individual Angus Young wears school uniforms on stage • Is stative however main verb can be dynamic • Episodic sentences report on a finite number of specific events Mary ate a steak Angus Young wore a school uniform twice this week • Why the distinction matters ? • Event extraction • Document summarization

  3. Scope • Determine automatically whether a sentence is habitual or episodic on the basis of sentence internal information John smoked cigarettes when he was young ® habitual John smoked a cigarette this morning ® episodic • Note: Lexically stative predicates excluded Italians like wine • Do not exhibit habitual/specific ambiguity

  4. Related Work • Sangweon Suh (2006) • Distinguish generic from specific NP reference in context Cats like tuna A cat ate the tuna • Eric Siegel (1995), Michael Brent (1990) • Determine whether verb is stative or eventive He called his father He resembles his father • On basis of distribution of verbs with overt features • Siegel (1995) uses co-occurrence frequencies of 14 features

  5. Approach • Supervised Classification • Built training corpus • Selected features for machine learning • Evaluated features • Applied Machine Learning algorithms

  6. Annotation of Corpus • Generated set of 1,816 sentences with 72 verb types by: • Randomly selecting sentences from Penn Treebank (WSJ & Brown) • Ignoring sentences with a lexically stative predicate • Adding all sentences in Penn Treebank whose main verb was a morphological variant of a verb from initial set

  7. Annotation of Corpus • Annotated each sentence as habitual/episodic by: • Checking for explicit attribution • Frequency adverbs (usually, often) ® habitual • Quantificational temporals (every night) ® habitual • Habitual past (used to) ® habitual • Definite temporals (yesterday) ® episodic • Tested whether sentence meaning changed by adding modifier usually • No change in meaning indicated habitual • Examining discourse context • Assumed bunching of categories in a discourse • Applying intuitive semantic judgment • Single event or habit

  8. Data • Verbs varied significantly in lexical bias • report almost only episodic, require almost only habitual • Final step: • Eliminated highly biased • lexical verbs • Final data set • 1,052 sentences • 57 verb forms • Baseline distribution

  9. Features • Selected 14 sentence internal features • Features that can be derived from annotation scheme of Penn Treebank • Evaluated features relevance to classification • Compare feature distribution by category against baseline

  10. Tense Hungarian Radio saves its most politically outspoken broadcasts for around midnight ® habitual Mickie laughed ® episodic

  11. Aspect Everyone else was running ® episodic The school has received letters from parents ® episodic

  12. Temporals Every time I closed my eyes, I saw gray eyes rushing at me with a knife ® habitual On Tuesday, Trellborg’s directors announced plans to spin off two big divisions as separately quoted companies on Stockholm’s stock exchange ® episodic

  13. Subject Features Commands go only from an office to the man of nearest lower rank ® habitual The women indicated which family member usually did household chores® episodic

  14. Object Features Not surprisingly, he sometimes bites ® habitual In Los Angeles, in our lean years, we gave parties ® habitual Robert Bernstein, chairman and president of Random House Inc., announced his resignation from the publishing house he has run for 23 years ® episodic

  15. Conditionals After all, gold prices soar when inflation is high ® habitual

  16. Prepositional Features Anheuser-Busch announced its plan at the same time it reported third quarter net income rose a lower-than-anticipated 5.2% to $238.3 million ® episodic Treasury prices ended mixed in light trading ® episodic You ‘ve got blood on your cheek ® episodic

  17. Feature Analysis Summary • Reliable features for episodicity • Less reliable features for habituality

  18. Feature Limitations • Problem areas • Semantics of predicate arguments She was moving like a ballet dancer She was moving in café society as Lady Diana Harrington • Semantics of predicate He is meeting a girl from Brooklyn He is seeing a girl from Brooklyn • Sentence-external factors (discourse) John rarely ate fruit. He just ate oranges John didn’t eat much at breakfast. He just ate oranges • Sentences with ‘dual’-category • Too rare to analyze statistically After all, in all five recessions since 1960, stocks declined

  19. Machine Learning • Considered three classifiers • Rule-based • Association Rule Classifier • Decision Tree (J48) Classifier • Probabilistic • Naïve Bayes • Evaluated against baseline where all sentences blindly with majority-class (episodic) • 73.1% overall precision

  20. Association Rule Classifier • Applied Predictive Apriori algorithm (Scheffer 2004) for multivariate analysis • Algorithm generates n-best feature patterns predicting a category • Manually pruned results • Only patterns selecting for episodicity > 85% • Only patterns selecting for habituality > 80% • If R1Ì R2, discard R2 • If sorted list {R1,R2 ..Rn } has same coverage as {R1,R2 ..Rn+1 } for category, discard Rn+1 • Model • 4 patterns (213) are habitual 173 times • 11 patterns (882) are episodic 735 times

  21. Association Rule based Classifier

  22. Decision Tree (J48) Classifier • Weka’s implementation of C4.5 • Used ten-fold cross validation for evaluation • Model • 2 patterns (184) are habitual 161 times • 2 patterns (829) are episodic 727 times

  23. Decision Tree (J48) Classifier • Impact of feature groups (J48) • All select roughly the same number of episodic sentences • Variation is more on habitual/incorrect sentences

  24. Results • Classifier Performance 1Not evaluated using an independent validation set • Habituality Recall • Tense and presence of a quantificational temporal are best indicators of habituality • However both do not provide sufficient coverage of habitual examples by themselves

  25. Conclusion • Syntactic features is a viable method for category disambiguation • Identification of episodic sentences outperforms identification of habitual sentences • There are more overt markers of habituality however more features show bias for episodicity • Performance • Impact of lexical verb and sentence external features • Feature extraction process in some cases approximation • Annotation errors/consistency in corpus

  26. Future Work • Impact of discourse • Independently annotate sentence, predecessor, successor in isolated context • Weighting factor for ambiguous situations • Annotate sentence, predecessor, successor conscious of context

  27. ? Questions ?

More Related