420 likes | 633 Vues
An Overview of Opinionated Tasks and Corpus Preparation. Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan http://research.nii.ac.jp/ntcir/ntcir-ws6/opinion/ntcir5-opinionws-en.html. What is an opinion?.
E N D
An Overview of Opinionated Tasks and Corpus Preparation Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan http://research.nii.ac.jp/ntcir/ntcir-ws6/opinion/ntcir5-opinionws-en.html
What is an opinion? • Opinion is a subjective information • Opinion usually contains an opinion holder an attitude, and a target, but not obligatory • A sentential clause or a meaningful unit (in Chinese) is the smallest unit of an opinion.
Why opinion processing is important? • There is explosive information on the Internet, and it’s hard to extract opinions by humans. • Opinions of the public is an important index of companies and the government. • Opinions change over time, so to keep track of opinions automatically is an important issue.
Fact-based vs. Opinion-based • Examples: • Circular vs. Happy • He is an engineer. vs. He thinks that his boss is a kind person. • Why the sky is blue? vs. Do people support the government?
Previous Work (1) • English: • Sentiment words (Wiebe et al., Kim and Hovy, Takamura et al.) • Opinion sentence extraction (Riloff and Wiebe, Kim and Hovy) • Opinion document extraction (Wiebe et al., Pang et al.) • Opinion summarization: reviews and products (Hu and Liu, Dave et al.)
Previous Work (2) • Japanese • Opinion extraction (Kobayasi et al.: reviews, at word/sentence level) • Opinion summarization (Morinaga et al.: product reputations, Seki, Eguchi, and Kando) • Chinese • Opinion extraction (Ku, Wu, Li and Chen) • Opinion summarization (Ku, Li, Wu and Chen) • News and Blog Corpora (Ku, Liang and Chen) • Korean?
Corpus Preparation (1) • Quantity • How much materials should we collect? • Words/Sentences/Documents • Source • What source should we pick? Mining opinions from general documents or the obvious opinionated documents? (ex. Discussion group) • News, Reviews, Blogs, …
Corpus Preparation (2) • Different granularity • Word level • Sentence level • Clause level • Document level • Multi-documents (summarization) • Different sources • Different languages
Previous Work (Corpus Preparation 1/5) • Example: NRRC Summer Workshop on Multiple-Perspective QA • People involved: 1 researcher, 3 graduate students, 6 professors • Collect 270,000 documents, over 11-month periods, retrieve documents relevant to 8 topics, more than 200 documents of each topic Workshop: MPQA: Multi-Perspective Question AnsweringRRC Host: Northeast Regional Research Center (NRRC) 2002Leader: Prof. Janyce WiebeParticipants: Eric Breck, Chris Buckley, Claire Cardie, Paul Davis, Bruce Fraser, Diane Litman, David Pierce, Ellen Riloff, Theresa Wilson
Previous Work (Corpus Preparation 2/5) • Source: news documents (World News Connection - WNC) • In another work on word level: 2,615 words
Previous Work (Corpus Preparation 3/5) • Example: Using NTCIR Corpus (Chinese) • Reusable • NTCIR2, news documents • Retrieve documents relevant to 6 topics • On average, 34 documents for each topic • At Word level: 838 words • Experiments using NTCIR3 are ongoing
Previous Work(Corpus Preparation 5/5) • Example: Using reviews from Web (Japanese) • Specific domains: cars and games • 15,000 reviews (230,000 sentences) for cars, 9,700 reviews (90,000 sentences) for games • Using topic words (ex. Companies of cars and games) • Semi-automatic methods for collecting opinion terms (with patterns)
Corpus Annotation • Annotation types (1) • Support/Non-support • Sentiment/Non-sentiment • Positive/Neutral/Negative • Strong/Medium/Weak • Annotation types (2) • Opinion holder/Attitude/Target • Nested opinions
Previous Work (Corpus Annotation 1/4) • Example: NRRC Summer Workshop on Multiple-Perspective QA (English) • Total 114 documents annotated • 57 with deep annotations, 57 with shallow annotations • 7 annotators
Previous Work (Corpus Annotation 2/4) • Tags • Opinion: on=implicit/formally declared • Fact: onlyfactive=yes/no • Subjectivity: strength=high/medium/lo • Attitude: neg-attitude/pos-attitude • Writer: opinion holder information
Previous Work (Corpus Annotation 3/4) • Example: Using NTCIR Corpus (Chinese) • Total 204 documents are annotated • 3 annotators • Using XML-style tags • Define types, but no strength (considering the agreement issue)
Corpus Evaluation (1) • How to choose materials? • Filter out candidates whose annotations are too diverse among annotators? (Agreements?) • How many annotators are needed for one candidate? (More annotators, lower agreements) • How to build the gold standard? • Voting • Use instances with consistent annotations
Corpus Evaluation (2) • How to evaluate a corpus for a subjective task? • Agreement (Is it enough?) • Kappa value (To what agreement level ?) • Almost perfect agreement • Substantial agreement • Moderate agreement • Fair agreement • Slight agreement • Less than change agreement
Kappa coefficient (wiki) • Cohen's kappa coefficient is a statistical measure of inter-rater agreement. • It is generally thought to be a more robust measure than simple percent agreement calculation since κ takes into account the agreement occurring by chance. • Cohen's kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories. • The first evidence of Cohen's Kappa in print can be attributed to Galton (1892).
Kappa coefficient (wiki) • The equation for κ is: • Pr(a) is the relative observed agreement among raters • Pr(e) is the hypothetical probability of chance agreement • If the raters are in complete agreement then κ = 1 • If there is no agreement among the raters (other than what would be expected by chance) then κ ≤ 0.
Kappa coefficient • Two raters are asked to classify objects into categories 1 and 2. The table below contains cell probabilities for a 2 by 2 table. • P0=P11+P22, observed level of agreement • This value needs to be compared to the value that you would expect if the two raters were totally independent • Pe=P1P1+P2P2 http://www.childrensmercy.org/stats/definitions/kappa.htm
Example • Hypothetical Example: 29 patients are examined by two independent doctors (see Table). 'Yes' denotes the patient is diagnosed with disease X by a doctor. 'No' denotes the patient is classified as no disease X by a doctor. • P0=P11+P22=(10 + 12)/29 = 0.76 • Pe=P1P1+P2P2 =0.586 * 0.345 + 0.655 * 0.414 = 0.474 • Kappa = (0.76 - 0.474)/(1 - 0.474) = 0.54 http://www.dmi.columbia.edu/homepages/chuangj/kappa/
Online Kappa Calculator http://justus.randolph.name/kappa
Previous WorkCorpus Evaluation • Different languages/annotations may have different agreements. • Kappa: 0.32-0.65 (only factivity, English) • Kappa: 0.40-0.68 (word level, Chinese) • Different annotators with different background may have different agreements.
What are needed for this work? • What kind of documents? News? Others? • All relevant documents? • Provide only the type of documents, or fully annotated documents for training? • Provide some sentiment words as clues? • To what granularity? Word, clause, sentence, document, or multi-document? • In which language? Mono-lingual, multi-lingual or cross-lingual?
Natural Language Processing Lecture 15Opinionated Applications Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan
Opinionated Applications • Opinion extraction • Sentiment word mining • Opinionated sentence extraction • Opinionated document extraction • Opinion summarization • Opinion tracking • Opinionated question answering • Multi-lingual/Cross-lingual opinionated issues
Opinion Mining • Opinion extraction identifies opinion holders, extracts the relevant opinion sentencesand decides their polarity. • Opinion summarizationrecognizes the major events embedded in documents and summarizes the supportive and the non-supportive evidence. • Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions.
Opinion extraction • Extracting opinion evidence from words, sentences, and documents, and then to tell their polarities. • The composition of semantics and that of opinions are very much alike in documents: • Word -> Sentence -> Document • The algorithm is designed based on the composition of different granularities.
Seeds • Sentiment words in General Inquirer (GI) and Chinese Network Sentiment Dictionary (CNSD) are collected as seeds. • GI is in English, while CNSD is in Chinese. GI is translated in Chinese. • A total of 10,542 qualified seeds are collected in NTUSD.
Thesaurus Expansion • The seed vocabulary is enlarged by • 同義詞詞林 • 中央研究院中英雙語知識本體詞網 (The Academia Sinica Bilingual Ontological WordNet) • Words in the same clusters may not always have the same opinion tendency. • 寬恕(forgive) vs. 姑息 (appease) • How to distinguish words with different polarities within the same cluster/synset • Opinion tendency of a word and its strength
Sentiment Tendency of a Word • A sentiment degree of a Chinese word w is the average of the sentiment scores of the composing characters c1, c2, ..., cp • A positive score denotes a positive word. • A negative score denotes a negative word. • Score zero denotes non-sentiment or neutral.
Evaluation Corpus Preparation • Source: TREC (English;News) / NTCIR (Chinese;News) / Blog (Chinese:Casual Writing) • Corpus is prepared for multi-genre and multi- lingual issues. • Corpus is prepared to evaluate opinion extraction, summarization, and tracking.
Opinion Summarization • Find important topics of a document set. • Find relative sentences of important topics • Find opinions embedded in sentences. • Summarize opinions of important topics.
Opinion Tracking • Opinion tracking is a kind of graph-based opinion summarization. • We are concerned of how opinions change over time. • An opinion tracking system tells how people change their opinions as time goes by. • To track opinions, opinion extraction and summarization are necessary. • Opinion extraction tells the changes of opinion polarities, while opinion summarization tells the correlated events.