570 likes | 605 Vues
Identifying Subjective Language. Janyce Wiebe University of Pittsburgh. Overview. General area: acquire knowledge of evaluative and speculative language and use it in NLP applications Primarily corpus-based work Today: results of exploratory studies. Collaborators.
E N D
Identifying Subjective Language Janyce Wiebe University of Pittsburgh
Overview General area: acquire knowledge of evaluative and speculative language and use it in NLP applications Primarily corpus-based work Today: results of exploratory studies
Collaborators • Rebecca Bruce, Vasileios Hatzivassiloglou, Joseph Phillips • Matthew Bell, Melanie Martin,Theresa Wilson
Subjectivity Tagging Recognizing opinions and evaluations (Subjective sentences) as opposed to material objectively presented as true (Objective sentences) Banfield 1985, Fludernik 1993, Wiebe 1994, Stein & Wright 1995
Examples At several different levels, it’s a fascinating tale. subjective Bell Industries Inc. increased its quarterly to 10 cents from 7 cents a share. objective
Subjectivity ? “Enthused” “Wonderful!” “Great product” “Complained” “You Idiot!” “Terrible product” “Speculated” “Maybe”
Examples Strong addressee-oriented negativeevaluation • Recognizing flames (Spertus 1997) • Personal e-mail filters (Kaufer 2000) I had in mind your facts, buddy, not hers. Nice touch. “Alleges” whenever facts posted are not in your persona of what is “real.”
Examples Opinionated, editorial language • IR, text categorization (Kessler et al. 1997) • Do the writers purport to be objective? Look, this is a man who has great numbers. We stand in awe of the Woodstock generation’s ability to be unceasingly fascinated by the subject of itself.
Examples Belief and speech reports • Information extraction, summarization, intellectual attribution (Teufel & Moens 2000) Northwest Airlines settled the remaining lawsuits, a federal judge said. “The cost of health care is eroding our standard of living and sapping industrial strength”, complains Walter Maher.
Other Applications • Review mining (Terveen et al. 1997) • Clustering documents by ideology (Sack 1995) • Style in machine translation and generation (Hovy 1987)
Potential Subjective Elements Sap: potential subjective element "The cost of health care is eroding standards of living and sapping industrial strength,” complains Walter Maher. Subjective element
Subjectivity • Multiple types, sources, and targets Somehow grown-ups believed that wisdom adhered to youth. We stand in awe of the Woodstock generation’s ability to be unceasingly fascinated by the subject of itself.
Outline • Data and annotation • Sentence-level classification • Individual words • Collocations • Combinations
Annotations Manually tagged + existing annotations Three levels: expression level sentence level document level
Expression Level Annotations [Perhaps you’ll forgive me] for reposting his response They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff]
Expression Level Annotations Probably the most natural level Difficult for manual and automatic tagging: detailed no predetermined classification unit To date: used for training and bootstrapping
Document Level Annotations Manual: flames in Newsgroups Existing: opinion pieces in the WSJ: editorials, letters to the editor, arts & leisure reviews * to ***** reviews + More directly related to applications, but …
Document Level Annotations Opinion pieces contain objective sentences and Non-opinion pieces contain subjective sentences News reports present reactions (van Dijk 1988) “Critics claim …” “Supporters argue …” Editorials contain facts supporting the argument Reviews contain information about the product
Document Level Annotations In a WSJ data set: opinion pieces subj 74% obj 26% non-opinion pieces subj 43% obj 57%
Data in this Talk Sentence level 1000 WSJ sentences 3 judges reached good agreement after rounds Used for training and evaluation Expression level 1000 WSJ sentences (2J) 462 newsgroup messages (2J) + 15413 words (1J) Single round; results promising Used to generate features, and not for evaluation
Data in this Talk Document level: Existing opinion-piece annotations used to generate features Manually refined classifications used for evaluation Identified editorials not marked as such Only clear instances labeled To date: 1 judge Distinct from the other data 3 editions, each more than 150K words
Sentence Level Annotations A sentence is labeled subjective if any significant expression of subjectivity appears “The cost of health care is eroding our standard of living and sapping industrial strength,’’ complains Walter Maher. “What an idiot,’’ the idiot presumably complained.
Sentence Classification Probabilistic classifier Binary Features: pronoun, adjective, number, modal ¬ “will “, adverb ¬ “not”, new paragraph Lexical feature: good for subj; good for obj; good for neither 10-fold cross validation; 51% baseline 72% average accuracy across folds 82% average accuracy on sentences rated certain
Identifying PSEs There are few high precision, high frequency potential subjective elements
Identifying Individual PSEs Classifications correlated with adjectives Good subsets Dynamic adjectives (Quirk et al. 1985) Positive, negative polarity; gradability automatically identified in corpora (Hatzivassiloglou & McKeown 1997) Results from distributional similarity
Distributional Similarity Word similaritybased on distributional pattern of words Much work in NLP(see Lee 99, Lee and Pereira 99) Purposes: Improve estimates of unseen events Thesaurus and dictionary construction from corpora
R2 R3 I have a brown dog R1 R4 Lin’s Distributional Similarity Word R W I R1 have have R2 dog brown R3 dog . . . Lin 1998
Word1 Word2 R W R W R W R W R W R W R W R W Pairs statistically correlated with Word1 Lin’s Distributional Similarity Sum over RWint: I(Word1,RWint) + I(Word2,RWint) / Sum over RWw1: I(Word1,RWw1) + Sum over RWw2: I(Word2,RWw2)
Bizarre strange similar scary unusual fascinating interesting curious tragic different contradictory peculiar silly sad absurd poignant crazy funny comic compelling odd
Bizarre strange similar scary unusual fascinating interesting curious tragic different contradictory peculiar silly sad absurd poignant crazy funny comic compelling odd
Bizarre strange similar scary unusual fascinating interesting curious tragic different contradictory peculiar silly sad absurd poignant crazy funny comic compelling odd
Filtering Filtered Set Seed Words Word + cluster removed if precision on training set < threshold Words+ Clusters
Parameters Threshold Seed Words Words+ Clusters Cluster size
Seeds from Annotations 1000 WSJ sentences with sentence level and expression level annotations They promised [e+ 2 yet] more for [e+ 3 reallygood] [e? 1 stuff]. "It's [e? 3 really] [e- 3 bizarre]," says Albert Lerman, creative director at the Wells agency.
9 10 1 10 Experiments 1/10 used for training, 9/10 for testing Parameters: Cluster-size fixed at 20 Filtering threshold: precision of baseline adjective feature on the training data +7.5% ave 10-fold cross validation [More improvements with other adj features]
3 WSJ data sets, over 150K words each Opinion Pieces For measuring precision: Prec(S) = # instances of S in opinions / total # instances of S Baseline for comparison: # words in opinions / total # words Skewed distribution: 13-17% words in opinions
Parameters Threshold 1-70% Seed Words Words+ Clusters 2-40 Cluster size
Results Varies with parameter settings, but there are smooth regions of the space Here: training/validation/testing
Low Frequency Words Single instance in a corpus ~ low frequency Analysis of expression level annotations: there are many more single-instance words in subjective elements than outside them
Unique Words Replace all words that appear once in the test data with “UNIQUE” +5-10% points
Collocations here we go again get out of here what a well and good rocket science for the last time just as well … ! Start with the observation that low precision words often compose higher precision collocations
Collocations Identify n-gram PSEs as sequences whose precision is higher than the maximum precision of its constituents W1,W2 is a PSE if prec(W1,W2) > max (prec(W1),prec(W2)) W1,W2,W3 is a PSE if prec(W1,W2,W3) > max(prec(W1,W2),prec(W3)) or prec(W1,W2,W3) > max(prec(W1),prec(W2,W3))
Collocations Moderate improvements: +3-10% points But with all unique words mapped to “UNIQUE”: +13-24% points
Example Collocations with Unique highly||adverb UNIQUE||adj highly unsatisfactory highly unorthodox highly talented highly conjectural highly erotic
Example Collocations with Unique UNIQUE||verb out||IN farm out chuck out ruling out crowd out flesh out blot out spoken out luck out
Collocations UNIQUE||adj to||TO UNIQUE||verb impervious to reason strange to celebrate wise to temper they||pronoun are||verb UNIQUE||noun they are fools they are noncontenders UNIQUE||noun of||IN its||pronoun sum of its usurpation of its proprietor of its
Opinion Results: Summary Best Worst baseline 17% baseline 13% +prec/freq +prec/freq Adjs +21/373 +09/2137 Verbs +16/721 +07/3193 2-grams +10/569 +04/525 3-grams +07/156 +03/148 1-U-grams +10/6065 +06/6045 2-U-grams +24/294 +14/288 3-U-grams +27/138 +13/144 Disparate features have consistent performance N Collocation sets largely distinct
Does it add up? Good preliminary results classifying opinion pieces using density and feature count features.
Mutual bootstrapping (Riloff & Jones 1999) Co-training (Collins & Singer 1999) to learn both PSEs and contextual features Integration into a probabilistic model Text classification and review mining Future Work
References • Banfield, A. (1982). Unspeakable Sentences. Routledge and Kegan Paul. • Collins, M. & Singer, Y. (1999). Unsupervised models for named entity classification. EMNLP-VLC-99. • van Dijk, T.A. (1988). News as Discourse. Lawrence Erlbaum. • Fludernik, M. (1983). The Fictions of Language and the Languages of Fiction. Routledge. • Hovy, E. (1987). Generating Natural Language Under Pragmatic Constraints. PhD dissertation. • Kaufer, D. (2000). Flaming. www.eudora.com • Kessler, B., Nunberg, G., Schutze H. (1997). Automatic Detection of Genre. ACL-EACL-97. • Riloff, E. & Jones R. (1999). Learning Dictionaries for Information Extraction by Multi-level Boot-strapping. AAAI-99