320 likes | 347 Vues
Explore how General Inquirer and Yoshikoder help analyze news content by categorizing words and identifying sentiments. Learn about using categories from Harvard IV-4 and Lasswell dictionaries for a comprehensive analysis.
E N D
Yoshikoder &General Inquirer Jonathan Simon Elizabeth Langdon COM 633, Fall 2010
General Inquirer - Basics • The function of GI is to generate a count of words falling into various dictionary-supplied categories • Uses categories from the Harvard IV-4 dictionary and the Lasswell dictionary, as well as five categories based on the social cognition work of Semin and Fiedler • 182 categories in all • Each category is a list of words and word senses
General Inquirer - Dictionaries • Examples of Harvard IV-4 categories: • Pstv 1045 positive words, plus a subset of 557 words tagged Affilfor words indicating affiliation or supportiveness • Ngtv 1160 negative words, plus a subset of 833 words tagged Hostile for words indicating an attitude or concern with hostility or aggressiveness • Strong 1902 words implying strength, plus a subset of 689 words tagged Power, indicating a concern with power, control or authority • Weak 755 words implying weakness, plus a subset of 284 words tagged Submit, indicating submission to authority or power, dependence on others, vulnerability to others, or withdrawal
General Inquirer - Dictionaries • Examples of Lasswell categories: • PowGain = 65 words about power increasing • PowLoss = 109 words of power decreasing • PowEnds = 30 words about the goals of the power process • PowAren = 53 words referring to political places and environments • PowCon= 228 words for ways of conflicting
General Inquirer - Dictionaries • For names and basic descriptions of each category: http://www.wjh.harvard.edu/~inquirer/homecat.htm • For a list of all words contained in each of the 182 categories: http://www.webuse.umd.edu:9090/tags/
General Inquirer - Dictionaries • Users CAN add new categories • Considerations for adding categories: • “Somewhat comparable to producing a set of survey questions that everyone agrees has validity in measuring a well-specified construct” • To map categories with accuracy requires attention to word use, word senses, and disambiguation routines
General Inquirer – Application & Use • Purpose: Analyze content of news articles from three different sources • Articles are about the same Ted Strickland fundraiser • Include a newscast (via closed captioning) from WKYC, an online article from FOX8, and online article from The Plain Dealer
General Inquirer – Application & Use Beginning Screens:
General Inquirer – Application & Use • Input: • Select the content you wish to analyze • Use plain text format (.txt) • Analyze a single file or multiple files at one time • To analyze multiple files simultaneously, save them to a directory (e.g. F:\NewsArticles) • In output, each file will have its own line of data within your Excel file (one row for single files, multiple rows for multiple files)
General Inquirer – Application & Use • Dictionary: • You will not need to change this! GI will analyze your content using all of its 182 categories • Output: • Specify where you want the data output to be saved, name the file and add the .xls extension
General Inquirer – Application & Use • Tags: • Output is a matrix of counts and percentages of words falling into the dictionaries’ semantic categories • Format column includes r (raw count, or simple count of words) and s (scaled count, or percentage of words in each category • Wordcount column is total number of words in the file • Leftovers column shows words not found in any dictionary
General Inquirer – Application & Use • Words: • Output is a count of all words appearing in your file • Rows are words, columns are file names
General Inquirer – Results • Overall, the WKYC article can be viewed as being more positive and affiliative when compared to the FOX and PD articles • WKYC story showed highest percentages of all positively valenced categories • FOX or Plain Dealer showed higher percentages of all negatively valenced categories • CATA / GI findings are reflective of the overall tone of the articles, as experienced by readers (e.g. pulled quotes, emphasis on political / economic climates, etc.)
Yoshikoder- Basics • Yoshikoder is provides a general word count, custom dictionary word count, KWIC, and reading highlight function • The program can handle multiple documents and analyze them individually or side by side • All dictionaries must be either custom built or downloaded from an external source – several dictionaries are available on the Yoshikoder website
Yoshikoder- Dictionaries • Dictionaries consist of 2 levels: Categories and Patterns • Categories are concept words that fall into a larger construct • Patterns are individual words or phrases that fall into a category and are actually searched for • Yoshikoder dictionaries allow wild cards (*)
Yoshikoder– Application & Use • Purpose: Analyze content of news articles from three different sources • Articles are about the same Ted Strickland fundraiser • Include a newscast (via closed captioning) from WKYC, an online article from FOX8, and online article from The Plain Dealer • This analysis will identify which issues were most frequently mentioned in these stories given a list of predetermined possible issues
Yoshikoder– Application & Use Beginning Screen:
Yoshikoder– Application & Use • Add Document: • Documents must be .TXT file
Yoshikoder– Application & Use Multiple Documents can be uploaded
Yoshikoder – Building a Dictionary 5 6 7 9 8
Yoshikoder – Building a Dictionary It is important to make sure that the proper level is highlighted when adding a category or pattern. Yoshikoder can stack categories within each other
Yoshikoder – Import a Dictionary Pre-made or downloaded dictionaries can be imported
Yoshikoder – Analysis • A Yoshikoder “concordance” is a KWIC analysis • Concordance > Make Concordance • Results can be exported to HTML or Excel
Yoshikoder - Analysis • Report • Document Word Frequencies reports the frequencies of all words in an individual document • All Word Frequencies reports the frequencies of all words in all documents, sorted by document • Unified Word Frequencies reports the frequencies of all words in all selected documents
Yoshikoder - Analysis • Report • Dictionary Report shows the frequencies of dictionary words, by category or pattern for an individual document • A unified dictionary report downloads the category frequencies into an excel spreadsheet • Document Comparison will compare any two documents • Statistical Comparison Report will compare any two documents in terms of percent difference
Yoshikoder – Analysis Results The Channel 3 newscast contained more issue keywords than the Fox 8 and PD stories, with the biggest difference in focus being in education issues. The “Jobs” issue was most frequently mentioned, however it was more emphasized in the FOX 8 and PD story than in channel 3’s coverage. The remainder of issue mentions were sporadic with little overlap between the sources.