Contextual Text Mining Qiaozhu Mei email@example.com University of Illinois at Urbana-Champaign
Knowledge Discovery from Text Text Mining System
Trend of Text Content - Ramakrishnan and Tomkins 2007
Text on the Web (Unconfirmed) ~100B 10B Gold? ~3M day ~750k /day ~150k /day 6M 1M Where to Start? Where to Go?
Context Information in Text Check Lap Kok, HK Location Author Author’s occupation self designer, publisher, editor … Time 3:53 AM Jan 28th Source Sentiment Sentiment From Ping.fm Language Social Network
Rich Context in Text ~150k bookmarks /day 5Musers500MURLs ~3M msgs /day ~2M users 73 years ~400k authors~4k sources ~300M words/month 8Mcontributors100+languages 100Musers> 1M groups 750K posts/day 1Bqueries? Perhour? PerIP? 102M blogs
Text + Context = ? + = I Have A Guide! Context = Guidance
Query + User = Personalized Search Metropolis Street Racer MSR Magnetic Stripe Reader Molten salt reactor Modern System Research Mars sample return Wikipedia definitions If you know me, you should give me Microsoft Research… Medical simulation Montessori School of Raleigh MSR Racing Mountain Safety Research How much can personalized help?
Customer Review + Brand = Comparative Product Summary IBM Laptop Reviews APPLE Laptop Reviews DELL Laptop Reviews Can we compare Products?
Literature + Time = Topic Trends Hot Topics in SIGMOD What’s hot in literature?
Blogs + Time & Location = Spatiotemporal Topic Diffusion One Week Later How does discussion spread?
Blogs + Sentiment = Faceted Opinion Summary The Da Vinci Code Tom Hanks, who is my favorite movie star act the leading role. protesting... will lose your faith by watching the movie. ... so sick of people making such a big deal about a fiction book a good book to past time. What is good and what is bad?
Publications + Social Network =Topical Community Coauthor Network Information retrieval Data mining Machine learning Who works together on what?
A General Solution for All ….. Query log + User = Personalized Search Literature + Time = Topic Trends Review + Brand = Comparative Opinion Blog + Time & Location = Spatiotemporal Topic Diffusion Blog + Sentiment = Faceted Opinion Summary Publications + Social Network = Topical Community Text + Context = Contextual Text Mining
Contextual Text Mining Generative Model of Text Modeling Simple Context Modeling Implicit Context Modeling Complex Context Applications of Contextual Text Mining
Generative Model of Text the is harry potter movie plot time rowling 0.1 0.070.05 0.04 0.04 0.02 0.01 0.01 the.. movie.. harry .. potter is .. based.. on.. j..k..rowling Inference, Estimation harry the is potter movie harry Generation
Contextualized Models Sentiment = + • Inference: • How to estimate contextual models? • How to reveal contextual patterns? Year = 1998 book harry potter rowling movie harry potter director 0.15 0.100.08 0.05 0.18 0.090.08 0.04 Year = 2008 book harry potter is • Generation: • How to select contexts? • How to model the relations of • contexts? Source = official Location = China Location = US
Topics in Text search 0.2 engine 0.15 query 0.08 user 0.07 ranking 0.06 …… Data Mining Web Search Machine Learning • Many text mining tasks: • Extracting topics from text • Reveal contextual topic patterns Topic (Theme) = the subject of a discourse A topic covers multiple documents A document has multiple topics Topic = a soft cluster of documents Topic = a multinomial distribution of words
Probabilistic Topic Models ipod nano music download apple movie harry potter actress music 0.15 0.080.05 0.02 0.01 0.100.090.05 0.04 0.02 ipod 0.15 Topic 1 Apple iPod I downloaded the music of the movie harry potter to harry 0.09 Topic 2 my ipod nano Harry Potter
Parameter Estimation ipod nano music download apple movie harry potter actress music 0.100.090.05 0.04 0.02 0.150.080.05 0.02 0.01 Pseudo-Counts ? ? ? ? ? Guess the affiliation I I I I downloaded downloaded downloaded downloaded the the the the music music music music of of of of the the the the movie movie movie movie ? ? ? ? ? harry harry harry harry potter potter potter potter to to to to Estimate the params my my my my ipod ipod ipod ipod nano nano nano nano Maximizing data likelihood: Parameter Estimation using EM algorithm
How Context Affects Topics • Topics in science literature:16th Century v.s. 21st Century • When do a computer scientist and a gardener use “tree, root, prune” in text? • What does “tree” mean in “algorithm”? • In Europe, “football” appears a lot in a soccer report. What about in the US? Text are generated according to the Context!!
Simple Contextual Topic Model Context 1: 2004 Context 2: 2007 Topic 1 ipod iphone nano harry prisoner azkaban ipod mini 4gb potter order phoenix Apple iPod I downloaded the music of the movie Topic 2 harry potter to Harry Potter my iphone
Contextual Topic Patterns • Compare contextualized versions of topics: Contextual topic patterns • Contextual topic patterns conditional distributions • z: topic; c: context; w: word • : strength of topics in context • :content variation of topics
Example: Topic Life Cycles (Mei and Zhai KDD’05) Context = time Comparing
Example: Spatiotemporal Theme Pattern (Mei et al. WWW’06) Week2: The discussion moves towards the north and west Week1: The theme is the strongest along the Gulf of Mexico Week3: The theme distributes more uniformly over the states About Government Responsein Hurricane Katrina Week4: The theme is again strong along the east coast and the Gulf of Mexico Week5: The theme fades out in most states Context = time & location Comparing
Example: Evolutionary Topic Graph (Mei and Zhai KDD’05) 1999 2000 2001 2002 2003 2004 T KDD web 0.009classifica –tion 0.007features0.006topic 0.005… SVM 0.007criteria 0.007classifica – tion 0.006linear 0.005 … mixture 0.005random 0.006cluster 0.006clustering 0.005 variables 0.005… topic 0.010mixture 0.008LDA 0.006 semantic 0.005 … decision 0.006tree 0.006classifier 0.005class 0.005Bayes 0.005 … … Classifica - tion 0.015text 0.013unlabeled 0.012document 0.008labeled 0.008learning 0.007 … Informa - tion 0.012web 0.010social 0.008retrieval 0.007distance 0.005networks 0.004 … Context = time … Comparing
Theme: retrieval models SIGIR papers term 0.1599relevance 0.0752weight 0.0660 feedback 0.0372 model 0.0310 probabilistic 0.0188 document 0.0173 … Publication of the paper “A language modeling approach to information retrieval” 1992 year Starting of the TREC conferences Example: Event Impact Analysis(Mei and Zhai KDD’06) xml 0.0678email 0.0197 model 0.0191collect 0.0187 judgment 0.0102 rank 0.0097 … vector 0.0514concept 0.0298model 0.0291space 0.0236 boolean 0.0151 function 0.0123 … 1998 model 0.1687language 0.0753estimate 0.0520 parameter 0.0281distribution 0.0268 smooth 0.0198 likelihood 0.0059 … probabilist 0.0778model 0.0432logic 0.0404 boolean 0.0281 algebra 0.0200 estimate 0.0119 weight 0.0111 … Context = event Comparing
Implicit Context in Text • Some contexts are hidden • Sentiments; intents; impact; etc. • Document contexts: don’t know for sure • Need to infer this affiliation from the data • Train a model M for each implicit context • Provide M to the topic model as guidance
Modeling Implicit Context ? ? Positive Negative hate awful disgust good like perfect 0.21 0.03 0.01 0.10 0.05 0.02 ? actress music visual color size quality price scratch problem director accent plot Topic 1 Apple iPod I like the song of the movie on my perfect ipod but Topic 2 Harry Potter hate the accent
Topics love great r1 hateawful r2 Semi-supervised Topic Model(Mei et al. WWW’07) Maximum Likelihood Estimation (MLE) Add Dirichlet priors Document 1 d1 Maximum A Posterior (MAP) Estimation 2 d2 w … dk k Guidance from the user Similar to adding pseudo-counts to the observation
Example: Faceted Opinion Summarization (Mei et al. WWW’07) Context = topic & sentiment
Results: Sentiment Dynamics Facet: the book “ the da vinci code”. ( Bursts during the movie, Pos > Neg ) Facet: the impact on religious beliefs. ( Bursts during the movie, Neg > Pos )
Results: Topic with User’s Guidance Guidance from the user:I know two topics should look like this • Topics for iPod:
Complex Context in Text • Complex context structure of contexts • Many contexts has latent structure • Time; location; social network • Why modeling context structure? • Review novel contextual patterns; • Regularize contextual models; • Alleviate data sparseness: smoothing;
Modeling Complex Context Context 1 B Context A and B are closely related A • Two Intuitions: • Regularization: Model(A) and Model(B) should be similar • Smoothing: Look at B if A doesn’t have enough data Topic 1 Ad as Ad as Ad as Ad as Topic 2
Applications of Contextual Text Mining • Personalized Search • Personalization with backoff • Social Network Analysis (for schools) • Finding Topical communities • Information Retrieval (for industry labs) • Smoothing Language Models
Personalization with Backoff (Mei and Church WSDM’08) • Ambiguous query: MSG • Madison Square Garden • Monosodium Glutamate • Disambiguate based on user’s prior clicks • We don’t have enough data for everyone! • Backoff to classes of users • Proof of Concept: • Context = Segments defined by IP addresses • Other Market Segmentation (Demographics)
Apply Contextual Text Mining to Personalized Search • The text data: Query Logs • The generative model: P(Url| Query) • The context: Users (IP addresses) • The contextual model: P(Url| Query, IP) • The structure of context: • Hierarchical structure of IP addresses
Evaluation Metric: Entropy (H) • Difficulty of encoding information (a distr.) • Size of search space; difficulty of a task • H = 20 1 million items distributed uniformly • Powerful tool for sizing challenges and opportunities • How hard is search? • How much does personalization help?
How Hard Is Search? • Traditional Search • H(URL | Query) • 2.8 (= 23.9 – 21.1) • Personalized Search • H(URL | Query, IP) • 1.2 (= 27.2 – 26.0) Personalization cuts H in Half!
Context = First k bytes of IP Full personalization: every context has a different model: sparse data! 184.108.40.206 156.111.188.* Personalization with backoff: similar contexts have similar models 156.111.*.* 156.*.*.* *.*.*.* No personalization: all contexts share the same model
Backing Off by IP Sparse Data Missed Opportunity • λsestimated with EM • A little bit of personalization • Better than too much • Or too little λ4: weights for first 4 bytes of IP λ3 : weights for first 3 bytes of IPλ2 : weights for first 2 bytes of IP ……
Context Market Segmentation • Traditional Goal of Marketing: • Segment Customers (e.g., Business v. Consumer) • By Need & Value Proposition • Need: Segments ask different questions at different times • Value: Different advertising opportunities • Segmentation Variables • Queries, URL Clicks, IP Addresses • Geography & Demographics (Age, Gender, Income) • Time of day & Day of Week
Business Days v. Weekends:More Clicks and Easier Queries More Clicks Easier
Harder Queries at TV Time Harder queries
Doc Language Model (LM) θd : p(w|d) text 4/100=0.04 mining 3/100=0.03 clustering 1/100=0.01 … data = 0computing = 0… text =0.039 mining =0.028 clustering =0.01 … data = 0.001computing = 0.0005… Similarity function Query Language Model θq : p(w|q) p(w|q’) ? Data ½=0.5 Mining ½=0.5 Data ½=0.4 Mining ½=0.4 Clustering =0.1 … Application: Text Retrieval Smoothed Doc LM θd' : p(w|d’) Document d A text mining paper Query q data mining
Smoothing a Document Language Model Retrieval performance estimate LM smoothing LM text 4/100 = 0.04 mining 3/100 = 0.03 Assoc. 1/100 = 0.01 clustering 1/100=0.01 … data = 0computing = 0… Estimate a more accurate distribution from sparse data text = 0.039 mining = 0.028 Assoc. = 0.009 clustering =0.01 … data = 0.001computing = 0.0005… text = 0.038 mining = 0.026 Assoc. = 0.008 clustering =0.01 … data = 0.002computing = 0.001… Assign non-zero prob. to unseen words
Apply Contextual Text Mining to Smoothing Language Models • The text data: collection of documents • The generative model: P(word) • The context: Document • The contextual model: P(w|d) • The structure of context: • Graph structure of documents • Goal: use the graph of documents to estimate a good P(w|d)