1 / 30

The Semantics and Pragmatics of Natural Language Daniela G ÎFU d aniela.gifu @info.uaic.ro /

“AL EXANDRU I OAN CUZA” UNIVERSIT ATY OF IAŞI FACULT Y OF COMPUTER SCIENCE. The Semantics and Pragmatics of Natural Language Daniela G ÎFU d aniela.gifu @info.uaic.ro /. SENTIMENT ANALYSIS – AN OVERVIEW. What is Sentiment Analysis?. IMPACT OF TOPIC.

maranto
Télécharger la présentation

The Semantics and Pragmatics of Natural Language Daniela G ÎFU d aniela.gifu @info.uaic.ro /

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI FACULTY OF COMPUTER SCIENCE The Semantics and Pragmatics of Natural Language Daniela GÎFU daniela.gifu@info.uaic.ro /

  2. SENTIMENT ANALYSIS – AN OVERVIEW

  3. What is Sentiment Analysis?

  4. IMPACT OF TOPIC • Sentiment Analysis (SA) -one of the most current topics in NLP. • SA - offers possibility to monitor, to identify and understand in real time consumer's feelings and attitudes towards brands or topics in cyberspace and act accordingly. • SA - very popular in social media. • Target:  academia and industry.

  5. IMPACT IN SOCIAL MEDIA • Social media deals with the personal and social related opinion. • SA - very vital role in understanding the opinions from such conversation, posts, blogs, etc and deriving a sensible short summary consisting of most relevant opinions.SA - helps to: • Take quick decision • To change strategy and tactics used • To understand mood of the market •  Be with the changing trends •  To improve one’s product

  6. VALIDATY OF S.A. - evaluated by comparing sentiment scores for specific comments to their respective star ratings, which are common clues used by individuals to filter what they read during information acquisition.

  7. RESEARCH QUESTIONS... • How comparable are sentiment scores for reviews/comments to their respective star ratings? • How do sentiment scores impact decision outcomes?

  8. PURPOSE AND MOTIVATION • to create a complete SOTA in SA, with a focus on social media posts. • to enhance the results of context-based SA. • to clarify the descriptive behavior of receptor, affected by the multitude of information on forums. • to improve the performance of SA classifiers based on two approaches (machine learning & lexicon).

  9. CONTENT 1. Introduction 2. A general view on the subject 3. SA levels 3.1. SA at document level 3.2. SA at clause/sentence level 3.3. Features-based on SA 3.4. Comparative sentiment analysis 3.5. Sentiment lexicon acquisition 3.6. Conclusions 4. Applications 4.1. Business and government 4.2. Review sites 4.3. Other domains: politics and sociology 4.4. Conclusions 5. Conclusions and discussions

  10. 2. A general view on the subject SA - the process of detecting the contextual polarity of text. SA – terminology: - subjectivity [Lyons 1981; Langacker 1985]; - evidentiality [Chafe and Nichols 1986]; - analysis of stance [Biber and Finegan 1988; Conrad and Biber 2000]; - affect [Batson, Shaw, and Oleson 1992]; - point of view [Wiebe 1994; Scheibman 2002]; - evaluation [Hunston and Thompson, 2001] - appraisal [Martin and White 2005]; - opinion mining [Pang and Lee 2008]; - politeness [Gîfu and Topor, 2014].

  11. 3. Sentiment classification techniques Fig. 1Sentiment classification techniques

  12. Positive Negative Neutral 3. SA levels - document a) supervised approach Fig. 2 Supervised learning – for three classes

  13. 3. SA levels - document a) supervised approach Fig. 2 Python NLTK Demos for Natural Language Text Processing http://text-processing.com/demo/

  14. 3. SA levels - document a) unsupervised approach Based on determining the semantic orientation (SO) of specific words/phrases. Sentiment lexicon (words/expressions) – [Taboada et. al, 2011] Set of predefined POS models – [Turney, 2002]

  15. 3. SA levels – clause/sentence • More complex – identifying if a sentence is opinionated and establishing the nature of opinion; • using supervised methods; • 1. classifying clauses into two classes [Yu and Hatzivassiloglou, 2003] • 2. an approach based on minimal reductions. [Pang and Lee, 2004] • The problem: How can we classify the interrogations, sarcasm, metaphor, humor, etc.?

  16. 3. SA levels – features • more entities for each analyzed text or more attributes for each entity; • extraction of the attributes of an object; • Becali a ajutat mult săracii1/, [dar] nimeni nu a ştiut exact2/[cum]a făcut atâţia bani3/. • extract and store all NPs; • keep only NPs with frequency above a learned-by-experiments threshold [Hu and Liu, 2004]

  17. 3. SA levels – comparative • When a user doesn’t offer a direct opinion about a product. [Jindal and Liu, 2006] • Dacia Logan arată mult mai bine decât Dacia Solenza. • adverbial adjectives:mai mult, mai puţin(En. - more, less) • superlative adjectives and adverbs:mai, cel puţin (En. - more, at least) • additional clauses:decât, împotriva (En. - rather than, against). • cover 98% of the comparative opinions

  18. 3. SA levels – sentiment lexicon manual approaches: WordNet [Fellbaum, 1998], European EuroWordNet [Vossen, 1998], Balkanet [Tufiş et al., 2004] Our work: AnaDiP-2010 inspired by LIWC-2007 [Pennebaker et al., 2001]: 9 emotional classes. <classes> <class name="emotional" id="1"/> <class name="positive" id="2" parent="1"/> <class name="negative" id="3" parent="1"/> <class name="anxiety" id="4" parent="3"/> <class name="anger" id="5" parent="3"/> <class name="sadness" id="6" parent="3"/> <class name="spectacular" id="7" parent="2"/> <class name="firmness” id="8" parent="2"/> <class name="moderation" id="9" parent="2"/> </classes>

  19. 3. SA levels – sentiment lexicon Our software performs part-of-speech (POS) tagging and lemmatization of words. For example: <lexic name="Politic" lang="ro"> <word lemma="clevetitor" classes="1,3,6"/> <word lemma="genial" classes="1,2,7"/> … </lexic>

  20. 3. SA levels – sentiment lexicon • corpus-based approaches – a set of words/phrases extracted from a relatively small corpus is extended by using a large corpus of documents on a single domain. • a classical work [Hatzivassiloglou and McKeown, 1997] using a set of linguistic connectors şi, sau, nici, fie (en. - and, or, not, either). • Examples: • bărbat puternicşiarmonios / bărbat puternicşiarmonios • femeie senzualăsau inteligentă? / femeie sărmanăsau înstărită? • băiatul nu e nici prost,nici deștept.../ băiatul nu e nici prost,nici urât...

  21. 4. Applications – business and government • “Why aren’t consumers buying our laptop?” when the price is good, and the weight is obviously in accord with consumer’s wishes.[Lee, 2004] • Two kinds of answers: • - the subjective reasons about intangible qualities (e.g. thephysical keyboard is tacky) • or • misperceptions (even though they are wrong) • Solution: By tracking consumer’s opinions, one could realize trend prediction in sales, etc. [Mishne & Glance, 2006].

  22. 4. Applications – business and government • Solution based on a dictionary + semantic role of negations and pragmatic connectors: • classification of emotionally charged words into two classes: positive and negative (also a neutral class); • more classes, associating to each word with a value in the range -5 to +5; • [Gîfu and Cristea, 2012a] a scale to the interval -3 to +3; • [Gîfu and Scutelnicu, 2013] a scale of values: -1 to +1.

  23. 4. Process phases: POS-tagger & NER & Anaphora Resolution <DOCUMENT> <P ID="1"> <S ID="1"> <W EXTRA="NotInDict" ID="11.1" LEMMA="" MSD="Vmip3s" Mood="indicative" Number="singular" POS="VERB" Person="third" Tense="present" Type="predicative" offset="0"></W> <NP HEADID="11.2" ID="0" ref="0"> <W Case="direct" Gender="masculine" ID="11.2" LEMMA="nimic" MSD="Pz3msr" Number="singular" POS="PRONOUN" Person="third" Type="negative" offset="1">Nimic</W> <W ID="11.3" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="7">mai</W> <W Case="direct" Definiteness="no" Gender="masculine" ID="11.4" LEMMA="odios" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" offset="11">odios</W> <W ID="11.5" LEMMA="," MSD="COMMA" POS="COMMA" offset="16">,</W> • <W ID="11.6" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="18">mai</W> • <W ID="11.7" LEMMA="oribil" MSD="Rg" POS="ADVERB" offset="22">oribil</W> • <W Case="direct" Definiteness="no" EXTRA="NotInDict" Gender="masculine" • ID="11.8" LEMMA="decât" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" • offset="29">decât</W> • </NP> • <NP HEADID="11.9" ID="1" ref="1"> • <W Case="direct" Definiteness="yes" Gender="masculine" ID="11.9" LEMMA="pantof" • MSD="Ncmpry" Number="plural" POS="NOUN" Type="common" offset="35">pantofii</W> • <NP HEADID="11.10" ID="2" ref="2"> • <W Case="direct" Definiteness="no" Gender="masculine" ID="11.10" LEMMA="sport" • MSD="Ncmsrn" Number="singular" POS="NOUN" Type="common" offset="44">sport</W> • <W ID="11.11" LEMMA="cu" MSD="Sp" POS="ADPOSITION" offset="50">cu</W> • <NP HEADID="11.12" ID="3" re f="3"> • <W Case="direct" Definiteness="yes" Gender="feminine" ID="11.12" • LEMMA="platformă" MSD="Ncfsry" Number="singular" POS="NOUN" Type="common" • offset="53">platformă</W> • </NP> • </NP> • </NP> • </DOCUMENT>

  24. 4. Process phases: POS-tagger & NER & Anaphora Resolution Fig. 3 The interface of the EAT system

  25. 4. Applications – business and government • 46 rules for values. • <rule> • <word attribute=”LEMMA” value=”cel”/> • <word attribute=”LEMMA” value=”mai”/> • <word attribute=”POS“ value=”ADJECTIVE”/> • </rule> • Ex: cel mai bun • <rule> • <word attribute=”LEMMA” value=”cel”/> • <word attribute=”LEMMA” value=”mai”/> • <word attribute=”POS” value=”bun”/> • </rule>

  26. 4. Applications – review sites • to appreciate the reviews and ratings about your company or yourself; • to summarize reviews. • Our work: the consumer’s behaviour, civic identity [Gîfu et al., 2013] • 6 profiles: the-decent, the-porn-aggressive, the-incitator, the-affected, the-author-attacker and supporter. • - we established a number of features (lexical, syntactic, semantic): style, emotional classes, etc.

  27. 4. Applications – politics/sociology Two dimensions in politics: 1. to know what electors are thinking about the political candidates [Efron, 2004, Goldberg et al., 2007, Layer et al., 2003, Mullen and Malouf, 2008]; 2. to clarify the politicians’positions to enhance the quality of information that voters have access to [Bansal et al., 2008, Gîfu, 2013b] In sociology: - how ideas and innovations are propagated [Rosen, 1974] Ex: the polls on different issues

  28. CONCLUSIONS AND DISCUSSIONS SA - a complex task; SA - an emerging discipline with promising academic and, most important, industrial applications; .... the sentiment classification problem - more challenging Future work... - to develop an independent sentiment classifier using machine learning methods; - to compare the results obtained with machine learning to sentiment classification on traditional topic-based categorization; - to analyse the sentiment lexicon in old Romanian language in terms of diachronic semantics.

  29. Thank you for your attention! ?

More Related