1 / 67

Research on Intelligent Text Information Management

Research on intelligent text information management, including personalized retrieval models, topic maps, recommender systems, contextual text mining, opinion integration, and information quality.

gardnerl
Télécharger la présentation

Research on Intelligent Text Information Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ChengXiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology Statistics University of Illinois, Urbana-Champaign http://www.cs.uiuc.edu/homes/czhai, czhai@cs.uiuc.edu Research on Intelligent Text Information Management Contains joint work with Xuehua Shen, Bin Tan, Qiaozhu Mei, Yue Lu, Hongning, Vinod, and other members of the TIMan group

  2. Research Roadmap Web, Email, and Bioinformatics Current focus Current focus • - Personalized • Retrieval models • Topic map • Recommender • Contextual text mining • Opinion integration • Information quality Entity/Relation Extraction Search Applications Summarization Visualization Mining Applications Filtering Mining Information Organization Information Access Knowledge Acquisition Search Extraction Categorization Clustering Natural Language Content Analysis Text

  3. Sample Projects Optimization of Retrieval Models User-Centered Adaptive Information Retrieval Multi-Resolution Topic Map for Browsing Contextual Text Mining Opinion Integration and summarization Information Trustworthiness

  4. Project 1:Optimization of Retrieval Models • Content-based matching is a critical component in any search system • Developed a number of retrieval models to optimize content matching • Language models: various LMs supporting proximity, word translations, feedback, … • Axiomatic framework: theoretical analysis of retrieval models • Recently looking into optimal interactive retrieval and domain-specific retrieval models (feedback, exploitation-exploration tradeoff, medical case retrieval, forum retrieval, … )

  5. Project 2:User-Centered Adaptive IR (UCAIR) • A novel retrieval strategy emphasizing • user modeling (“user-centered”) • search context modeling (“adaptive”) • interactive retrieval • Implemented as a personalized search agent that • sits on the client-side (owned by the user) • integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users) • collaborates with each other • goes beyond search toward task support

  6. Non-Optimality of Document-Centered Search Engines As of Oct. 17, 2005 Car Car Software Car Animal Car Query = Jaguar Mixed results, unlikely optimal for any particular user

  7. The UCAIR Project (NSF CAREER) WEB Email ... Viewed Web pages Query History Search Engine Search Engine Personalized search agent Search Engine “jaguar” Personalized search agent Desktop Files “jaguar”

  8. Potential Benefit of Personalization Suppose we know: • Previous query = “racing cars” vs. “Apple OS” • “car” occurs far more frequently than “Apple” in pages browsed by the user in the last 20 days 3. User just viewed an “Apple OS” document Car Car Software Car Animal Car

  9. Intelligent Re-ranking of Unseen Results When a user clicks on the “back” button after viewing a document, UCAIR reranks unseen results to pull up documents similar to the one the user has viewed

  10. UCAIR Outperforms Google[Shen et al. 05] Precision at N documents UCAIR toolbar available at http://sifaka.cs.uiuc.edu/ir/ucair/

  11. Future: Personal Information Agent Desktop WWW Intranet Email User Profile Active Info Service E-COM IM … Task Support Security Handler Personal Content Index Sports Blog Frequently Accessed Info … Literature

  12. Ongoing Work UCAIR system Recommendation and advertising on social networks

  13. Project 3: Multi-Resolution Topic Map for Browsing • Promoting browsing as a “first-class citizen” • Multi-resolution topic map for browsing • Enable a user to find information through navigation • Very useful when a user can’t formulate effective queries or uses a small screen device • Search log as information footprints • Organize search log into a topic map • Allow a user to follow information footprints of previous users • Enable social surfing

  14. Querying vs. Browsing

  15. Information Seeking as Sightseeing When query fails, browsing comes to rescue… • Know the address of an attraction site? • Yes: take a taxi and go directly to the site • No: walk around or take a taxi to a nearby place then walk around • Know what exactly you want to find? • Yes: use the right keywords as a query and find the information directly • No: browse the information space or start with a rough query and then browse

  16. Current Support for Browsing is Limited • Hyperlinks • Only page-to-page • Mostly manually constructed • Browsing step is very small • Web directories • Manually constructed • Fixed categories • Only support vertical navigation Beyond hyperlinks? ODP Beyond fixed categories? How to promote browsing as a “first-class citizen”?

  17. Sightseeing Analogy Continues… Horizontal navigation Region Zoom in Zoom out

  18. Topic Map for Touring Information Space Topic regions Multiple resolutions Zoom in 0.03 0.05 0.03 0.02 0.01 Zoom out Horizontal navigation

  19. Topic-Map based Browsing Demo

  20. How can we construct such a multi-resolution topic map? Multiple possibilities…

  21. Search Logs as Information Footprints Footprints in information space User 2722 searched for "national car rental" [!] at 2006-03-09 11:24:29 User 2722 searched for "military car rental benefits" [!] at 2006-03-10 09:33:37 (found http://www.valoans.com) User 2722 searched for "military car rental benefits" [!] at 2006-03-10 09:33:37 (found http://benefits.military.com) User 2722 searched for "military car rental benefits" [!] at 2006-03-10 09:33:37 (found http://www.avis.com) User 2722 searched for "enterprise rent a car" [!] at 2006-04-05 23:37:42 (found http://www.enterprise.com) User 2722 searched for "meineke car care center" [!] at 2006-05-02 09:12:49 (found http://www.meineke.com) User 2722 searched for "car rental" [!] at 2006-05-25 15:54:36 User 2722 searched for "autosave car rental" [!] at 2006-05-25 23:26:54 (found http://eautosave.com) User 2722 searched for "budget car rental" [!] at 2006-05-25 23:29:53 User 2722 searched for "alamo car rental" [!] at 2006-05-25 23:56:13 ……

  22. Information Footprints  Topic Map • Challenges • How to define/construct a topic region • How to control granularities/resolutions of topic regions • How to connect topic regions to support effective browsing • Two approaches • Multi-granularity clustering • Query editing

  23. Collaborative Surfing New queries become new footprints Navigation trace enriches map structures Clickthroughs become new footprints Browse logs offer more opportunities to understand user interests and intents

  24. Project 4:Contextual Text Mining • Documents are often associated with context (meta-data) • Direct context: time, location, source, authors,… • Indirect context: events, policies, … • Many applications require “contextual text analysis”: • Discovering topics from text in a context-sensitive way • Analyzing variations of topics over different contexts • Revealing interesting patterns (e.g., topic evolution, topic variations, topic communities)

  25. Example 1:Comparing News Articles Before 9/11 US blog CNN During Iraq war European blog Fox Current Others Blog Vietnam War Afghan War Iraq War What’s in common? What’s unique?

  26. More Contextual Analysis Questions • What positive/negative aspects did people say about X (e.g., a person, an event)? Trends? • How does an opinion/topic evolve over time? • What are emerging topics? What topics are fading away? • How can we characterize a social network?

  27. Research Questions • Can we model all these problems generally? • Can we solve these problems with a unified approach? • How can we bring human into the loop?

  28. Contextual Probabilistic Latent Semantics Analysis ([KDD 2006]…) View1 View2 View3 Themes government 0.3 response 0.2.. new donate government government donate 0.1relief 0.05help 0.02 .. donation city 0.2new 0.1orleans 0.05 .. New Orleans Theme coverages: …… Texas document July 2005 Choose a theme Criticismofgovernment responseto the hurricane primarily consisted ofcriticismof itsresponse to … The totalshut-in oil productionfrom the Gulf of Mexico … approximately 24% of theannual productionand the shut-ingas production … Over seventy countriespledged monetary donationsor otherassistance. … Draw a word from i Documentcontext: Time = July 2005 Location = Texas Author = xxx Occup. = Sociologist Age Group = 45+ … response help aid Orleans Texas July 2005 sociologist Choose a view Choose a Coverage

  29. Comparing News Articles Iraq War (30 articles) vs. Afghan War (26 articles) The common theme indicates that “United Nations” is involved in both wars Collection-specific themes indicate different roles of “United Nations” in the two wars

  30. Spatiotemporal Patterns in Blog Articles • Query= “Hurricane Katrina” • Topics in the results: • Spatiotemporal patterns

  31. Theme Life Cycles (“Hurricane Katrina”) Oil Price price 0.0772oil 0.0643gas 0.0454 increase 0.0210product 0.0203 fuel 0.0188 company 0.0182 … New Orleans city 0.0634orleans 0.0541new 0.0342louisiana 0.0235flood 0.0227 evacuate 0.0211 storm 0.0177 …

  32. Theme Snapshots (“Hurricane Katrina”) Week2: The discussion moves towards the north and west Week1: The theme is the strongest along the Gulf of Mexico Week3: The theme distributes more uniformly over the states Week4: The theme is again strong along the east coast and the Gulf of Mexico Week5: The theme fades out in most states

  33. Theme Life Cycles (KDD Papers) gene 0.0173expressions 0.0096probability 0.0081microarray 0.0038… marketing 0.0087customer 0.0086 model 0.0079business 0.0048… rules 0.0142association 0.0064 support 0.0053…

  34. Theme Evolution Graph: KDD 1999 2000 2001 2002 2003 2004 T web 0.009classifica –tion 0.007features0.006topic 0.005… SVM 0.007criteria 0.007classifica – tion 0.006linear 0.005 … mixture 0.005random 0.006cluster 0.006clustering 0.005 variables 0.005… topic 0.010mixture 0.008LDA 0.006 semantic 0.005 … decision 0.006tree 0.006classifier 0.005class 0.005Bayes 0.005 … … Classifica - tion 0.015text 0.013unlabeled 0.012document 0.008labeled 0.008learning 0.007 … Informa - tion 0.012web 0.010social 0.008retrieval 0.007distance 0.005networks 0.004 … … … …

  35. Multi-Faceted Sentiment Summary (query=“Da Vinci Code”)

  36. Separate Theme Sentiment Dynamics “religious beliefs” “book”

  37. Event Impact Analysis: IR Research Theme: retrieval models SIGIR papers term 0.1599relevance 0.0752weight 0.0660 feedback 0.0372independence 0.0311 model 0.0310 frequent 0.0233 probabilistic 0.0188 document 0.0173 … Publication of the paper “A language modeling approach to information retrieval” 1992 year Starting of the TREC conferences xml 0.0678email 0.0197 model 0.0191collect 0.0187 judgment 0.0102 rank 0.0097 subtopic 0.0079 … vector 0.0514concept 0.0298extend 0.0297 model 0.0291space 0.0236 boolean 0.0151 function 0.0123 feedback 0.0077 … 1998 model 0.1687language 0.0753estimate 0.0520 parameter 0.0281distribution 0.0268 probable 0.0205 smooth 0.0198 markov 0.0137 likelihood 0.0059 … probabilist 0.0778model 0.0432logic 0.0404 ir 0.0338boolean 0.0281 algebra 0.0200 estimate 0.0119 weight 0.0111 …

  38. Topic Modeling + Social Networks • Authors writing about the same topic form a community Separation of 3 research communities: IR, ML, Web Topic Model Only Topic Model + Social Network

  39. Next Step in Contextual Text Mining • Combining contextual text analysis with visualization • More detailed semantic modeling (entities, relations,…) • Integration of search and contextual text analysis to develop an analyst’s workbench: • Interactive semantic navigation and probing • Synthesis of information/knowledge • Personalized/customized service

  40. Project 5:Opinion Integration and Summarization 190,451 posts 4,773,658results • Increasing popularity of Web 2.0 applications • more people express opinions on the Web How to digest all?

  41. Motivation:Two kinds of opinions 4,773,658results 190,451 posts How to benefit from both?

  42. Problem Definition Output Input Similar opinions Supplementary opinions Review Aspects Design Battery Price Topic: iPod DesignBatteryPrice.. Expert review with aspects Extra Aspects Text collection of ordinary opinions, e.g. Weblogs Integrated Summary

  43. Methods • Semi-Supervised Probabilistic Latent Semantic Analysis (PLSA) • The aspects extracted from expert reviews serve as clues to define a conjugate prior on topics • Maximum a Posteriori (MAP) estimation • Repeated applications of PLSA to integrate and align opinions in blog articles to expert review

  44. Results: Product (iPhone) • Opinion Integration with review aspects Unlock/hack iPhone Confirm the opinions from the review Activation Additional info under real usage Battery

  45. Results: Product (iPhone) • Opinions on extra aspects Another way to activate iPhone iPhone trademark originally owned by Cisco A better choice for smart phones?

  46. Results: Product (iPhone) • Support statistics for review aspects People care about price Controversy: activation requires contract with AT&T People comment a lot about the unique wi-fi feature

  47. Summarization of Contradictory Opinions[Kim & Zhai CIKM 09] How can we help analysts digest and interpret contradictory opinioons?

  48. Contrastive Opinion Summarization X Y x1 y1 x2 y2 x3 y3 x4 y4 … x5 ym … xn

  49. Contrastive Opinion Summarization X Y x1 y1 x2 y2 u1 v1 x3 y3 u2 v2 … … x4 y4 uk vk … x5 ym … xn Contrastive Opinion Summary

  50. Problem Formulation Representativeness X Y x1 U V y1 x2 y2 u1 v1 x3 y3 u2 v2 … … x4 y4 uk vk … x5 ym … Contrastiveness xn

More Related