1 / 17

Extending SASI to Satirical Product Reviews: A Preview

Extending SASI to Satirical Product Reviews: A Preview. Bernease Herman University of Michigan Monday, April 22, 2013. Satirical Amazon Reviews. For a fun list: http://www.geekosystem.com/funny-amazon-reviews/.

nevina
Télécharger la présentation

Extending SASI to Satirical Product Reviews: A Preview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extending SASI to Satirical Product Reviews: A Preview Bernease Herman University of Michigan Monday, April 22, 2013

  2. Satirical Amazon Reviews For a fun list: http://www.geekosystem.com/funny-amazon-reviews/ Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 2

  3. Defining Irony, Sarcasm and Satire • Irony: “the use of words to convey a meaning that is the opposite of its literal meaning” • Sarcasm: “a sharply ironical taunt; sneeringorcuttingremark” • Satire: “the use of irony, sarcasm, ridicule, or the like, in exposing, denouncing, or deriding vice, folly, etc.” Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 3

  4. Sarcastic Review: Shure SE110 Sound Isolating Earphones Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 4

  5. Satirical Review: BIC Cristal For Her ballpoint pens Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 5

  6. Satirical Review: Zenith Men’s Defy Xtreme Titanium Watch Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 6

  7. Semi-supervised Algorithm for Sarcasm Identification (SASI) Algorithm detects sarcasm in individual sentences using k-Nearest Neighbors type algorithm. Features include pattern-matching and punctuation. There are additional features to consider for satire that are not present in sarcasm model. Classification baseline needs to be determined from multiple options. Sentence-based sarcasm detector, not full document. • Overview • Data preprocessing • Data enrichment • Pattern features • Punctuation features • Additional features • Classification • Baseline options • Summary Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 7

  8. Semi-supervised Algorithm for Sarcasm Identification (SASI) Jindal and Liu (2008) has 66,000 data set of book and product reviews. Filatova (2012) provides corpora of Amazon reviews labeled ironic, sarcastic, both, regular. • Specific products, authors, companies, and book titles were replaced with [product], [author], etc. • HTML and special symbols were removed from text • Overview • Data preprocessing • Data enrichment • Pattern features • Punctuation features • Additional features • Classification • Baseline options • Summary Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 8

  9. Semi-supervised Algorithm for Sarcasm Identification (SASI) Tsur et al. (2010) posited that sarcastic sentences co-appear with others. Gathered nearby sentences using Yahoo! BOSS API with seeds. Satirical reviews prove true, not sarcastic ones. Sarcasm Satire • Overview • Data preprocessing • Data enrichment • Pattern features • Punctuation features • Additional features • Classification • Baseline options • Summary Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 9

  10. Semi-supervised Algorithm for Sarcasm Identification (SASI) Via Davidov and Rappoport (2006, 2008): • High frequency words(HFWs) • Content words (CWs) What can I say about the 571B Banana Slicer that hasn't already been said about the wheel, penicillin or the iPhone… • “What can I CW CW the” • “I CW CW the [product]” • “[product] that hasn’t CW been CW about” • “about the CW” • “CW or the CW” • Overview • Data preprocessing • Data enrichment • Pattern features • Punctuation features • Additional features • Classification • Baseline options • Summary Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 10

  11. Semi-supervised Algorithm for Sarcasm Identification (SASI) • Overview • Data preprocessing • Data enrichment • Pattern features • Punctuation features • Additional features • Classification • Baseline options • Summary Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 11

  12. Semi-supervised Algorithm for Sarcasm Identification (SASI) Generic features regarding punctuation, all normalized to [0, 1]. • Sentence length in words • Number of “!” characters • Number of “?” characters • Number of quotes in sentence • Number of capitalized words or words in all capitals • Overview • Data preprocessing • Data enrichment • Pattern features • Punctuation features • Additional features • Classification • Baseline options • Summary Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 12

  13. Semi-supervised Algorithm for Sarcasm Identification (SASI) • Overview • Data preprocessing • Data enrichment • Pattern features • Punctuation features • Additional features • Classification • Baseline options • Summary • Burfoot and Baldwin (2009) introduced notion of validity for which models absurdity via a measure close to PMI. Related to number of made-up or mismatched named entities. Works well with satire, but not here. • Absurdity of product • Relevancy of product • How often product is reviewed Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 13

  14. Semi-supervised Algorithm for Sarcasm Identification (SASI) Classification via feature vectors for each pattern in training set. Use Euclidean distance for each of the matching vectors that share at least one pattern. • Overview • Data preprocessing • Data enrichment • Pattern features • Punctuation features • Additional features • Classification • Baseline options • Summary Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 14

  15. Semi-supervised Algorithm for Sarcasm Identification (SASI) Since semi-supervised, the classification algorithm takes advantage of the definition of sarcasm. Assumes low star rating and text with positive literal meaning. Not as clear-cut with satire, options: • Variation in rating for product • Purchases vs Page Views of product • People finding review helpful • Other heuristics • Overview • Data preprocessing • Data enrichment • Pattern features • Punctuation features • Additional features • Classification • Baseline options • Summary Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 15

  16. Semi-supervised Algorithm for Sarcasm Identification (SASI) Satire seems to have a distinct advantage in the data enrichment phase in comparison to sarcasm. Satire seems to have a huge disadvantage in the baseline options for classification compared to sarcasm. This is the detail that must be worked out before moving forward with implementation. • Overview • Data preprocessing • Data enrichment • Pattern features • Punctuation features • Additional features • Classification • Baseline options • Summary Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 16

  17. Future Goals Following the end of the course, I wish to implement SASI - taking the features mentioned today into account. Extend model to sarcasm in other domains. Any questions or comments? Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 17

More Related