1 / 19

Dispute Finder

The Problem Our Solution Possible Class Projects. Dispute Finder. Rob Ennals robert.ennals@intel.com . Not everything on the web is true, balanced, and objective. The great thing about the web is that anyone can say whatever they want . The bad thing is that they have.

gada
Télécharger la présentation

Dispute Finder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Problem Our Solution Possible Class Projects Dispute Finder Rob Ennals robert.ennals@intel.com

  2. Not everything on the web is true, balanced, and objective The great thing about the web is that anyone can say whatever they want. The bad thing is that they have.

  3. Newspapers are Dying The end of objective, fact-checked media sources that needed to appeal to an audience with a wide range of views?

  4. People like to fit in, so they tend to believe whatever they think is the consensus. Groups spend a lot of money trying to create false consensus buy sending their message from multiple seemingly independent sources. False Consensus

  5. Encourage Skepticism by telling a user when information they encounter in their lives is disputed. Reduce False Consensus by showing people evidence that other points of view are socially acceptable and well justified. Dispute Finder

  6. Highlight Disputed Claims on the Web Show Other Points of Viewfrom sources you trust

  7. Future: Find Disputes on TV Future: Disputes on TV opposed by: NY Times

  8. Future: Find Disputes in Audio

  9. How Dispute Finder Works Client: Browser Extension Server: Web site with API Runs textual entailment NLP algorithm, looking for known disputed claims. Stores a user-editable set of disputed claims, paraphrases, etc Low compute Javascript Sees pages user browses Arbitrary computation Can't see user browsing

  10. Find disputes on the Web Textual entailment of claims [Beth] User-guided textual entailment Duplicate detection for claims Sentiment Analysis Whatever else you think of... Project Suggestions

  11. Yahoo BOSS: Simple API access to Yahoo Search API. Python Interface is available. Amazon Mechanical Turk: Upload a CSV file with your data and pay users to mark it. Our Database of Claims and Paraphrases: Entered by users + Snopes + Politifact Common Tools

  12. Task: Find disputed claims on The Web Possible approach: Search BOSS for phrases like “falsely claimed that *”

  13. Where it gets interesting: Olbermann falsely claimed thatWatters lied when he denied taping the meeting -Are “Olbermann” and “Watters” ambiguous? Do we need a topic? - Are we sure “he” resolves to “Watters”? Obama Falsely ClaimsThere Are 47 Million Uninsured Americans - Do we need to know the time context? Pleasanton Man Brice Carrington Who Falsely Claimed ThatHe Was A Three-Time Oscar-Winner Pleads Guilty - Can we resolve “he”? Does this claim matter? Today in Jordan, he falsely claimed thatthe predominantly Sunni terrorist organization Al-Qaida was receiving training from predominantly Shia Iran - We want “Al-Qaida was receiving training from Iran” - no extras The report falsely claimed thatthere were 46 million Americans who lacked health insurance - We already have this one! Can we detect duplicates? -- see next... Evaluation/Training: Ask Turk users

  14. Task: Textual Entailment of Claims Given a set of web pages and a set of claims, find phrases on the pages that entail the claims. Examples for “Cap and trade would cause job losses”: “<title>Cap and trade...</title> [new paragraph] which results in fewer jobs created or higher unemployment.” - The subject may not be in the sentence itself - Can we translate “higher unemployment” to “job losses”? “a "cap and trade scheme" that "would suppress our economic recovery, cost jobs across our economy, ...” - There is other text between subject and object “The claim that cap and trade will create many ‘green collar’ jobs overlooks the massive job losses caused by draconian energy rationing policies” - There is other text between subject and object “.. rejects the cap and trade bill... [start bullets] ... “1.2 - 1.8 million jobs lost” - Big gap between subject and object. - Needs stemming Beth is working on this

  15. The classifier must be simple enough to run in Javascript inside a web browser. However the training can be complex, since it runs on the server. Possible approaches: Bag of words + stemming + WordNet synonyms Use an existing textual entailment tool, to derive simpler rules. Data Set: Our database of claims, mined from users and web sites Pages pulled from Yahoo BOSS Data sets from the RTE task Evaluation: Ask Turk users Beth is working on this

  16. Task: User-Guided Textual Entailment Ask the user questions that help us do accurate textual entailment. What is the minimum set of easy questions that improve coverage the most? Examples for “global warming does not exist”: "Global warming is just another scam for the government to think they can control you" - Does “scam” imply does not exist? Should we ask the user? “Man-made global warming does not exist” - Is “man-made” global warming the same as “global warming”? - Should we ask the user? “it does not mean that global warmingdoes not exist” - Are they disagreeing with the statement? - Should we ask the user? Possible Approach: Use BOSS to get a large number of pages about the topic. Use bag-of-words to cluster likely phrases into common patterns Ask the user about a minimal example from each cluster

  17. Given a huge set of claims mined from the web, how do we work out which ones are saying the same thing? Like textual entailment, except we can run entirely on the server, and the data set is a set of claims, rather than the whole web. Duplicate Detection for Claims

  18. Some, but not all, claims are largely sentiments. X is good vs X is bad. Can we automatically infer contrasting sentiments about something? Examples; “cap and trade will ruin america” : negative “cap and trade will create jobs” : positive “the folly of cap and trade” : negative “cap and trade is essential” : positive Possible Method: Pick a topic : e.g. “Cap and Trade” Find pages that support it and pages that oppose it Task: Sentiment Analysis

  19. A database of claims and paraphrases Currently ~2000 claims, user entered + snopes Politifact + others coming soon Funding and support for Mechanical Turk tagging Example web pages to analyse + help with Yahoo BOSS ~6k examples of tagged entailments from web snippets But: fairly low quality. Many are web snippets that repeat the same phrase. An interesting problem space :-) What we can provide

More Related