identifying comparative sentences in text documents n.
Skip this Video
Loading SlideShow in 5 Seconds..
Identifying Comparative Sentences in Text Documents PowerPoint Presentation
Download Presentation
Identifying Comparative Sentences in Text Documents

Identifying Comparative Sentences in Text Documents

134 Vues Download Presentation
Télécharger la présentation

Identifying Comparative Sentences in Text Documents

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006

  2. Introduction • Comparisons are one of the most convincing ways of evaluation. • Much of such info is available on the Web (customer reviews), forum discussions, and blogs. • Useful for product manufacturers and potential customers (to make purchasing decisions).

  3. Comparisons vs. Opinions • Comparisons can be both objective or subjective. • Comparative sentences have different language constructs from typical opinion sentences. • Comparative sentences may contain some indicators. Car X is much better than Car Y Car X is two feet longer than Car Y

  4. Related Work • Linguistics: based on grammars (syntax and semantics) and logic (gradability), which is more for human consumption than for automatic identification. • Opinion tasks: opinion extraction and classification problem, which is quite different from this comparison identification.

  5. Comparatives (Linguistic) • Comparatives are used to express explicit orderings between objects with respect to the degree or amount to which they possess some gradable property. John is taller than he was => John is tall to degree d

  6. Comparatives (Linguistic) • Two broad types: • Metalinguistic Comparatives: compare properties of one entity. Ronaldo is angrier than upset. • Propositional Comparatives: compare between two propositions. Three subcategories:

  7. Comparatives (Propositional) • Nominal Comparatives: (two sets of entities) Paul ate more grapes than bananas. • Adjectival Comparatives: (than, as good as) Ford is cheaper than Volvo. • Adverbial Comparatives: (occur after a verb phrase) Tom ate more quickly than Jane.

  8. Superlatives • Adjectival Superlatives: John is the tallest person. • Adverbial Superlatives: Jill did her homework most frequently. • Equality: conjunctions like and, or, … John and Sue, both like sushi.

  9. POS involved • NN: Noun • NNP: Proper Noun • VBZ: Verb, present tense, 3rd person singular • JJ: Adjective • RB: Adverb • JJR Adjective, comparatives • JJS: Adjective, superlative • RBR: Adverb, comparative • RBS: Adverb, superlative

  10. Limitations of linguistic classification. • Non-comparatives with comparative words: many non-comparatives contain comparative words. In the context of speed, faster means better. John has to try his best to win this game. • Limited coverage: many comparatives contain no comparative words. In market capital, Intel is way ahead of Amd. Nokia Samsung, both cell phones perform badly on heat dissipation index. The M7500 earned a World bench score of 85, whereas Asus A3V posted a mark of 89.

  11. Enhancements • First limitation: machine learning methods to distinguish comparatives and non-comparatives. • Second limitation: • User preferences: I prefer Intel to Amd = Intel is better than Amd • Implicit comparatives: Camera X has 2 MP, whereas camera Y has 5 MP.

  12. Types of Comparatives • Non-Equal Gradable: greater or less than type, including user preferences. • Equative (Gradable): equal to type • Superlative (Gradable): greater of less than all others type • Non-Gradable: • A is similar to B; A has feature F1 while B has F2; A has feature F but B doesn’t

  13. Tasks • Identifying comparative sentences from a given text data set. • Extracting comparative relations from sentences. (Mining comparative sentences and relations, AAAI 2006)

  14. Class Sequential Rules with Multiple Minimum Supports • For sequential pattern mining, patterns to the left and class to the right. • Select patterns: keywords – POS (JJR, RBR, JJS, RBS) + Words (favor, prefer, win beat, but…) + Phrases (number one, up against) • The performance of only using keywords are P=32%, R=94%.

  15. Support and Confidence • Using the minimum support of 20% and minimum confidence of 40%, one of the discovered CSRs is:

  16. Building the Sequence DB this/DT camera/NN has/VBZ significantly/RB more/JJR noise/NN at/IN iso/NN 100/CD than/IN the/DT nikon/NN 4500/CD {NN}{VBZ}{RB}{moreJJR}{NN}{IN}{NN} -> comparative • Sequences which exceeds 60% confidence threshold become rules. Minimum support = 10%. • 13 Manual rules with conjunctions as whereas/IN, but/CC, however/RB, while/IN, though/IN, although/IN, etc..

  17. Classification Learning • Machine learning methods: Feature Set = {X | X is the sequential pattern in CSR X → y} ∪ {Z | Z is the pattern in a manual rule Z → y}

  18. Data Preparation • Consumer reviews on products such as digital cameras, DBD players, MP3 players and cellular phones. • Forum discussions on topics such as Intel vs. AMD, Coke vs. Pepsi, and Microsoft vs. Google. • News articles on topics such as automobiles, ipods, and soccer vs. football.

  19. Number of Sentences in Data Sets

  20. Experimental Results (1)

  21. Experimental Results (2) • Review: R low P high -> short sentences, hard to find patterns • Articles and Forums: R high P low -> long sentences and find patterns too easily or find too many patterns.

  22. Conclusion and Future Work • Identifying comparative sentences. • Analyzing different types of comparative sentences. • Studying how to automatically classify subjective and objective comparisons.