1 / 1

Automated Bias Detection in Journalism

Automated Bias Detection in Journalism. Richard Lee, James Chen, Jason Cho. MOTIVATION. PROBLEM STATEMENT. RESULTS, CONT. Assuming we’re given parallel corpuses, identify biased sentences. From there, identify biased articles.

aleda
Télécharger la présentation

Automated Bias Detection in Journalism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Bias Detection in Journalism Richard Lee, James Chen, Jason Cho MOTIVATION PROBLEM STATEMENT RESULTS, CONT. • Assuming we’re given parallel corpuses, identify biased sentences. From there, identify biased articles. • We assume that we’ve already given annotated corpuses to play with. In the future and in practice, we won’t have such a luxury. • Data borrowed from earlier study[1]: • Parallel topics: 2008 political atmosphere • Drawn from political blog posts. • Match sentences/articles with three types of bias: liberal, neutral, or conservative. • Be able to detected bias from parallel corpuses (collections of machine-readable texts) • To the best of our knowledge, automated bias detection has never been done. • Previous research in NLP, eg, annotation systems and sarcasm detection. Applications in journalism, politics, rhetoric, linguistics, law, education. • Increased objectivity in news articles and educational resources. • Assisted grading and bias-flagging. Chart 1.Sentiment-Bias correlation analysis; shows little/no correlation BACKGROUND APPROACH • Used existing software to streamline design : • AlchemyAPI for simple sentiment analysis • SVMlight handles classification of texts: • In lay terms, it does the guess work, provided it is given sufficient training data (patterns to build off). • Bag of words approach; all the unique words in a document are given an index. The documents are then reconstructed from these indexes as vectors. For example: • John likes to watch movies. Mary likes too. John also likes to watch football games. • {"John": 1, "likes": 2, "to": 3, "watch": 4, "movies": 5, "also": 6, "football": 7, "games": 8, "Mary": 9, "too": 10} • [1, 2, 1, 1, 1, 0, 0, 0, 1, 1] • [1, 1, 1, 1, 0, 1, 1, 1, 0, 0][5] • Some existing research directed towards related areas: • Lexical bias indicators[1] • Quantification of media bias using conservative and liberal blogs[2] • Automated annotation[3] • Detection of other linguistic constructs, such as sarcasm[4] • However, no existing research directly tackling automated bias detection. • Borrowed data (with permission) from “Shedding light… on biased language” • Variety of NLP suites already popular, eg, NLTK • Consulted with Professor Eric K. Meyer, of the UIUC College of Media. • Topic framing, eg, estate tax vs. death tax. • Choice of source, such as cherry-picked quotes and data points. • Choice of words, eg, fetus vs. unborn baby. Chart 2. Bias-Sentiment with word count analysis; sentiment has little impact CONCLUSION • Three categories, baseline accuracy is 33.33% (assuming completely random). • Using simple bag-of-words approach we achieved ~40% accuracy for three-way classification. • Using bag of words approach for binary classification, we achieve ~60% accuracy. FUTURE WORKS • Bias != subjectivity • Eg, “The second amendment protects gun owners’ rights in America,” versus “I think Inception was quite overrated.” • More advanced bias detection techniques • Smarter “dictionary” of very biased topics • Ie, creating a list of words that give their sentences added priority when weighing sentiment • Summing sentiment by topic, etc. RESULTS Table1. Errors from classifier; bag-of-words experiences difficulties during textual classifications due to the high amount of overlapping features present in all datasets References T. Yano, P. Resnik, and N. Smith. 2010. Shedding (a thousands points of) light on biased language. Y. R. Lin, J. P. Bagrow, and D. Lazer. 2011. More than ever? Quantifying media bias in networks. L. Herzig, A. Nunes, and B. Snir. 2011. An annotation scheme for automated bias detection in Wikipedia. R. González-Ibáñez, S. Muresan, N. Wacholder. Identifying sarcasm in Twitter: A closer look. https://en.wikipedia.org/wiki/Bag-of-words_model

More Related