1 / 44

AI Seminar

AI Seminar. Our web page is at: www.cs.nmsu.edu/~gradrep Under “Events” in left frame. Identifying Ideological Point of View Part II. Melanie Martin September 5, 2001. Outline of this presentation. Where are we??? Ideology Statistical NLP and Machine Learning Discourse features

sarila
Télécharger la présentation

AI Seminar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AI Seminar Our web page is at: www.cs.nmsu.edu/~gradrep Under “Events” in left frame Melanie Martin - AI Seminar

  2. Identifying Ideological Point of ViewPart II Melanie Martin September 5, 2001 Melanie Martin - AI Seminar

  3. Outline of this presentation • Where are we??? • Ideology • Statistical NLP and Machine Learning • Discourse features • Internet • Conclusion Melanie Martin - AI Seminar

  4. Where are we??? • Let’s recall what we want to do: • Build a system that could take information from web pages and Usenet newsgroups on a given topic and segment, classify or cluster it by ideological point of view….. Melanie Martin - AI Seminar

  5. The Proposed System User inputs topic Ideological Clustering Search Engine Topic Clustering, Filtering Set of documents on topic Internet: Web pages, Usenet Docs on topic clustered by IPV Melanie Martin - AI Seminar

  6. Where are we??? • What do we need? • A computationally feasible definition of ideological point of view • A search engine, possibly with additional processing, to produce a collection of documents on the topic specified by the user Melanie Martin - AI Seminar

  7. Where are we??? • What else do we need? • A module to cluster documents by ideological point of view • A user interface • A way to evaluate the system Melanie Martin - AI Seminar

  8. Where are we??? • Why do we need this? • Some examples using google: • query: back pain ~2,220,000 • scoliosis ~121,000 • query: lyme disease ~163,000 • query: zoning shopping center ~65,100 • (add) clark county nv ~299 • query: un racism conference ~74,000 Melanie Martin - AI Seminar

  9. Outline of this presentation • Where are we??? • Ideology • Statistical NLP and Machine Learning • Discourse features • Internet • Conclusion Melanie Martin - AI Seminar

  10. Ideology • Working definition from van Dijk: “Ideologies are the fundamental beliefs of a group and its members.” • instantiated as Us vs. Them • predefined ideologies will not work across domains • want to avoid researcher bias • definition likely needs more work Melanie Martin - AI Seminar

  11. Ideology • Linguistics • van Dijk (1998) • Blommaert & Verschueren (1998) • Wang (1993) • Wortham & Locher (1996) Melanie Martin - AI Seminar

  12. Ideology • The Systems • Ideology Machine -1965 to 1973 - Abelson et al. • Politics - 1979 - Carbonell • Pauline - 1987 - Hovy • Tracking Point of View in Narrative - 1994 - Wiebe • Spin Doctor - 1994 - Sack • Terminal Time - 2000 - Mateas et al. Melanie Martin - AI Seminar

  13. Ideology • Some issues • Evaluation!!! • Hard-coded knowledge • Domain dependence • Cognitive plausibility • More precise definitions Melanie Martin - AI Seminar

  14. Outline of this presentation • Where are we??? • Ideology • Statistical NLP and Machine Learning • Discourse features • Internet • Conclusion Melanie Martin - AI Seminar

  15. Statistical NLP and ML • Two techniques we will consider • Latent Semantic Analysis • Probabilistic Classification Melanie Martin - AI Seminar

  16. Statistical NLP and ML • Issues • clustering versus classification • categories may not be predefined • may want to take a variety of features into account • favor learning over hard-coding knowledge • supervised versus unsupervised • cost of annotated training data Melanie Martin - AI Seminar

  17. Statistical NLP and ML • Latent Semantic Analysis • text represented as a matrix • entries are weighted frequency of word in context • semantic space obtained through SVD • words appearing in similar context have similar feature vectors • characterizes semantic content of words in context Melanie Martin - AI Seminar

  18. Statistical NLP and ML • Why LSA is a good choice here • semantics is key component of ideological discourse • clustering without need for predefined categories • already shown useful for: • summarization (Ando 2000) • text segmentation (Choi 2001) • measuring text coherence (Foltz 1998) Melanie Martin - AI Seminar

  19. Statistical NLP and ML • We want to look a little more closely at Ando’s work • uses term, sentence, and document vectors • modified SVD algorithm • interesting interface • Multi-document summarization by visualizing topical content. Rie Kubota Ando, Branimir Boguraev, Roy Byrd, and Mary Neff. ANLP/NAACL '00 Workshop on Automatic Summarization Melanie Martin - AI Seminar

  20. Statistical NLP and ML • Another option is a probabilistic classifier • assigns most probable class to an object bases on a probability model • can we get around predefined classes? Melanie Martin - AI Seminar

  21. Statistical NLP and ML • Probability model • defines joint distribution of variables • set of feature variables and a class variable • Wiebe and Bruce (1995) got around the issue of not knowing the classes in advance by breaking up the problem and using a series of classifiers Melanie Martin - AI Seminar

  22. Statistical NLP and ML • We need to come up with a set of features…our next topic • Then deciding which features to use can be determined statistically with goodness of fit of graphical models Melanie Martin - AI Seminar

  23. Statistical NLP and ML • Both methods seem to have a lot of potential • LSA would be easier to implement • possibly a baseline for evaluation of probabilistic classifiers • Less linguistic knowledge gain likely with LSA Melanie Martin - AI Seminar

  24. Outline of this presentation • Where are we??? • Ideology • Statistical NLP and Machine Learning • Discourse features • Internet • Conclusion Melanie Martin - AI Seminar

  25. Discourse features • If we use probabilistic classifiers we need features, so we look at: • linguistics • previous systems • discourse theory • literary theory Melanie Martin - AI Seminar

  26. Discourse features • From linguistics and discourse: • General strategy of most ideological discourse (van Dijk’s Ideological Square): • Emphasize positive things about Us • Emphasize negative things about Them • De-emphasize negative things about Us • De-emphasize positive things about Them Melanie Martin - AI Seminar

  27. Discourse features • How are these strategies instantiated in discourse? (van Dijk) • What is there: • argument structure • syntactic patterns • style and non-literal language • actor descriptions • thematic structure • topoi (standardized topics) Melanie Martin - AI Seminar

  28. Discourse features • What is not there • implication • presupposition • inference • goals and plans Melanie Martin - AI Seminar

  29. Discourse features • Disclaimers, selected examples: • Apparent Negation: I have nothing against X, but... • Apparent Concession: They may be very smart, but... • Apparent Empathy: They may have had problems, but... • Apparent Effort: We do everything we can, but... • Positive self-representation and face keeping Melanie Martin - AI Seminar

  30. Discourse features • Some discourse theories from Computational Linguistics • Mann & Thompson (RST) (1988) • Grosz & Sidner (G&S) (1986) • Morris & Hirst (Lexical chains) (1991) Melanie Martin - AI Seminar

  31. Discourse features • Issues • implementation • G&S, RST • finite number of fixed primitives • RST • domain specific • RST depends on training Melanie Martin - AI Seminar

  32. Discourse features • A reasonable first approach: Lexical Chains (Morris & Hirst) • Sequences of related words spanning a topical unit in the text • based on lexical cohesion • encapsulates context • helps identify key phrases Melanie Martin - AI Seminar

  33. Discourse features • Idea of Algorithm • read next word • if candidate • check chains within suitable span • check thesaurus or WordNet • check other knowledge sources • if found • include in chain • recalculate chain Melanie Martin - AI Seminar

  34. Discourse features • Lexical chains could help us in: • topic segmentation • intentional structure • lexical features for a classifier Melanie Martin - AI Seminar

  35. Discourse features • Lexical chains are easy to implement, but are unlikely to be sufficient… • For the next approximation: RST • Marcu’s implementation incorporating G&S • Mostly used for summarization and generation • Would help get at the argument structure of the text Melanie Martin - AI Seminar

  36. Discourse features • RST Basics • about 23 rhetorical relations • account for discourse coherence • link adjacent spans of text • 5 schema • defined in terms of relations • specify how spans can co-occur • nucleus and satellite spans • end up with tree structure Melanie Martin - AI Seminar

  37. Discourse features • Would most likely use RST to generate features for a classifier or as input to a pattern recognizer • Nuclei spans help pick out the more important segments of text • Produces a tree that gives the structure of the rhetorical structure of the text Melanie Martin - AI Seminar

  38. Outline of this presentation • Where are we??? • Ideology • Statistical NLP and Machine Learning • Discourse features • Internet • Conclusion Melanie Martin - AI Seminar

  39. Internet • We would like to mine the structure of the internet • see if there is a correspondence with groups • improved IR by topic • figure out what search engine to use as a base for our system Melanie Martin - AI Seminar

  40. Internet • Issues • topic or query disambiguation • what is a minimal unit • how to use the structure of the web • finding authorities • communities and subgraphs • Evaluation!!! Melanie Martin - AI Seminar

  41. Internet • Kleinberg (1997) • link based model • hub - links to many related authorities • authority • iterative weighting algorithm that converges (rapidly in practice) • can disambiguate authorities by sense • can be used to trawl for cyber communities Melanie Martin - AI Seminar

  42. Outline of this presentation • Where are we??? • Ideology • Statistical NLP and Machine Learning • Discourse features • Internet • Conclusion Melanie Martin - AI Seminar

  43. Conclusion • It seems that such a system can be built • find a good search engine • use Kleinberg’s algorithm to improve collection of documents retrieved • use LSA and/or a probabilistic classifier to handle the ideological point of view • with a probabilistic classifier use linguistic and discourse features • develop evaluation methodolgy Melanie Martin - AI Seminar

  44. The End Thanks for listening! If you want to know more, my Comprehensive Exam paper is at: www.CS.NMSU.Edu/~mmartin/courses/comps_all.html Melanie Martin - AI Seminar

More Related