AI Seminar: Identifying Ideological Point of View

AI Seminar Our web page is at: www.cs.nmsu.edu/~gradrep Under “Events” in left frame Melanie Martin - AI Seminar

Identifying Ideological Point of ViewPart II Melanie Martin September 5, 2001 Melanie Martin - AI Seminar

Outline of this presentation • Where are we??? • Ideology • Statistical NLP and Machine Learning • Discourse features • Internet • Conclusion Melanie Martin - AI Seminar

Where are we??? • Let’s recall what we want to do: • Build a system that could take information from web pages and Usenet newsgroups on a given topic and segment, classify or cluster it by ideological point of view….. Melanie Martin - AI Seminar

The Proposed System User inputs topic Ideological Clustering Search Engine Topic Clustering, Filtering Set of documents on topic Internet: Web pages, Usenet Docs on topic clustered by IPV Melanie Martin - AI Seminar

Where are we??? • What do we need? • A computationally feasible definition of ideological point of view • A search engine, possibly with additional processing, to produce a collection of documents on the topic specified by the user Melanie Martin - AI Seminar

Where are we??? • What else do we need? • A module to cluster documents by ideological point of view • A user interface • A way to evaluate the system Melanie Martin - AI Seminar

Where are we??? • Why do we need this? • Some examples using google: • query: back pain ~2,220,000 • scoliosis ~121,000 • query: lyme disease ~163,000 • query: zoning shopping center ~65,100 • (add) clark county nv ~299 • query: un racism conference ~74,000 Melanie Martin - AI Seminar

Ideology • Working definition from van Dijk: “Ideologies are the fundamental beliefs of a group and its members.” • instantiated as Us vs. Them • predefined ideologies will not work across domains • want to avoid researcher bias • definition likely needs more work Melanie Martin - AI Seminar

Ideology • Linguistics • van Dijk (1998) • Blommaert & Verschueren (1998) • Wang (1993) • Wortham & Locher (1996) Melanie Martin - AI Seminar

Ideology • The Systems • Ideology Machine -1965 to 1973 - Abelson et al. • Politics - 1979 - Carbonell • Pauline - 1987 - Hovy • Tracking Point of View in Narrative - 1994 - Wiebe • Spin Doctor - 1994 - Sack • Terminal Time - 2000 - Mateas et al. Melanie Martin - AI Seminar

Ideology • Some issues • Evaluation!!! • Hard-coded knowledge • Domain dependence • Cognitive plausibility • More precise definitions Melanie Martin - AI Seminar

Statistical NLP and ML • Two techniques we will consider • Latent Semantic Analysis • Probabilistic Classification Melanie Martin - AI Seminar

Statistical NLP and ML • Issues • clustering versus classification • categories may not be predefined • may want to take a variety of features into account • favor learning over hard-coding knowledge • supervised versus unsupervised • cost of annotated training data Melanie Martin - AI Seminar

Statistical NLP and ML • Latent Semantic Analysis • text represented as a matrix • entries are weighted frequency of word in context • semantic space obtained through SVD • words appearing in similar context have similar feature vectors • characterizes semantic content of words in context Melanie Martin - AI Seminar

Statistical NLP and ML • Why LSA is a good choice here • semantics is key component of ideological discourse • clustering without need for predefined categories • already shown useful for: • summarization (Ando 2000) • text segmentation (Choi 2001) • measuring text coherence (Foltz 1998) Melanie Martin - AI Seminar

Statistical NLP and ML • We want to look a little more closely at Ando’s work • uses term, sentence, and document vectors • modified SVD algorithm • interesting interface • Multi-document summarization by visualizing topical content. Rie Kubota Ando, Branimir Boguraev, Roy Byrd, and Mary Neff. ANLP/NAACL '00 Workshop on Automatic Summarization Melanie Martin - AI Seminar

Statistical NLP and ML • Another option is a probabilistic classifier • assigns most probable class to an object bases on a probability model • can we get around predefined classes? Melanie Martin - AI Seminar

Statistical NLP and ML • Probability model • defines joint distribution of variables • set of feature variables and a class variable • Wiebe and Bruce (1995) got around the issue of not knowing the classes in advance by breaking up the problem and using a series of classifiers Melanie Martin - AI Seminar

Statistical NLP and ML • We need to come up with a set of features…our next topic • Then deciding which features to use can be determined statistically with goodness of fit of graphical models Melanie Martin - AI Seminar

Statistical NLP and ML • Both methods seem to have a lot of potential • LSA would be easier to implement • possibly a baseline for evaluation of probabilistic classifiers • Less linguistic knowledge gain likely with LSA Melanie Martin - AI Seminar

Discourse features • If we use probabilistic classifiers we need features, so we look at: • linguistics • previous systems • discourse theory • literary theory Melanie Martin - AI Seminar

Discourse features • From linguistics and discourse: • General strategy of most ideological discourse (van Dijk’s Ideological Square): • Emphasize positive things about Us • Emphasize negative things about Them • De-emphasize negative things about Us • De-emphasize positive things about Them Melanie Martin - AI Seminar

Discourse features • How are these strategies instantiated in discourse? (van Dijk) • What is there: • argument structure • syntactic patterns • style and non-literal language • actor descriptions • thematic structure • topoi (standardized topics) Melanie Martin - AI Seminar

Discourse features • What is not there • implication • presupposition • inference • goals and plans Melanie Martin - AI Seminar

Discourse features • Disclaimers, selected examples: • Apparent Negation: I have nothing against X, but... • Apparent Concession: They may be very smart, but... • Apparent Empathy: They may have had problems, but... • Apparent Effort: We do everything we can, but... • Positive self-representation and face keeping Melanie Martin - AI Seminar

Discourse features • Some discourse theories from Computational Linguistics • Mann & Thompson (RST) (1988) • Grosz & Sidner (G&S) (1986) • Morris & Hirst (Lexical chains) (1991) Melanie Martin - AI Seminar

Discourse features • Issues • implementation • G&S, RST • finite number of fixed primitives • RST • domain specific • RST depends on training Melanie Martin - AI Seminar

Discourse features • A reasonable first approach: Lexical Chains (Morris & Hirst) • Sequences of related words spanning a topical unit in the text • based on lexical cohesion • encapsulates context • helps identify key phrases Melanie Martin - AI Seminar

Discourse features • Idea of Algorithm • read next word • if candidate • check chains within suitable span • check thesaurus or WordNet • check other knowledge sources • if found • include in chain • recalculate chain Melanie Martin - AI Seminar

Discourse features • Lexical chains could help us in: • topic segmentation • intentional structure • lexical features for a classifier Melanie Martin - AI Seminar

Discourse features • Lexical chains are easy to implement, but are unlikely to be sufficient… • For the next approximation: RST • Marcu’s implementation incorporating G&S • Mostly used for summarization and generation • Would help get at the argument structure of the text Melanie Martin - AI Seminar

Discourse features • RST Basics • about 23 rhetorical relations • account for discourse coherence • link adjacent spans of text • 5 schema • defined in terms of relations • specify how spans can co-occur • nucleus and satellite spans • end up with tree structure Melanie Martin - AI Seminar

Discourse features • Would most likely use RST to generate features for a classifier or as input to a pattern recognizer • Nuclei spans help pick out the more important segments of text • Produces a tree that gives the structure of the rhetorical structure of the text Melanie Martin - AI Seminar

Internet • We would like to mine the structure of the internet • see if there is a correspondence with groups • improved IR by topic • figure out what search engine to use as a base for our system Melanie Martin - AI Seminar

Internet • Issues • topic or query disambiguation • what is a minimal unit • how to use the structure of the web • finding authorities • communities and subgraphs • Evaluation!!! Melanie Martin - AI Seminar

Internet • Kleinberg (1997) • link based model • hub - links to many related authorities • authority • iterative weighting algorithm that converges (rapidly in practice) • can disambiguate authorities by sense • can be used to trawl for cyber communities Melanie Martin - AI Seminar

Conclusion • It seems that such a system can be built • find a good search engine • use Kleinberg’s algorithm to improve collection of documents retrieved • use LSA and/or a probabilistic classifier to handle the ideological point of view • with a probabilistic classifier use linguistic and discourse features • develop evaluation methodolgy Melanie Martin - AI Seminar

The End Thanks for listening! If you want to know more, my Comprehensive Exam paper is at: www.CS.NMSU.Edu/~mmartin/courses/comps_all.html Melanie Martin - AI Seminar

AI Seminar: Identifying Ideological Point of View

AI Seminar: Identifying Ideological Point of View

Presentation Transcript

G5AI AI Introduction to AI

Tactical AI

Game AI

/ai/

AI Defined

AI Definitions

Strong AI and Weak AI

AI-Class.com

Game AI versus AI: An Introduction to AI Game Programming

G5AI AI Introduction to AI

Movement AI

G5AI AI Introduction to AI

AI Seminar

ai or not ai

AI Redefined

Pruned Search Strategies CS344 : AI - Seminar 20 th January 2011

G51I AI Introduction to AI

Game AI versus AI: An Introduction to AI Game Programming

Azure AI Fundamentals AI-900 Dumps