On Frequent Chatters Mining

On Frequent Chatters Mining Claudio Lucchese 1st HPC Lab Workshop 1st HPC Workshp - Claudio Lucchese

Frequent Patterns Mining Claudio Lucchese, Salvatore Orlando, RaffaelePerego: Mining Top-K Patterns from Binary Datasets in Presence of Noise. SDM 2010 How may patterns do you see in the following dataset ? 1st HPC Workshp - Claudio Lucchese

Frequent Patterns Mining 1st HPC Workshp - Claudio Lucchese

Frequent Patterns Mining • usually rows and cols are not in “good-looking” order 1st HPC Workshp - Claudio Lucchese

State of the art • Most recent approaches try to discover the top-k patterns that optimize different cost functions: • Minimize Noise (“holes”) or • Minimize MDL • encoding(Patterns) + encoding(Data|Patterns) • Maximize Information Ratio: • Number of bits of information w.r.t. to the Maximum Entropy Model built on the basis of rows and cols marginal distribution • Minimize length of patterns and the amount of noise (our approach =) 1st HPC Workshp - Claudio Lucchese

Evaluation • Unsupervised: • Measure how well the proposed algorithm optimizes the proposed cost function • What is the best cost function ? • We are investigating supervised measures: • Unsupervised extraction: extract patterns from classification/clustering dataset without class/cluster labels information • Supervised evaluation: measure how well the patterns can predict/match classes/clusters • Preliminary result: • Fancy cost functions might not be the best ones 1st HPC Workshp - Claudio Lucchese

Information Overload in News Gianmarco De Francisci Morales, Aristides Gionis, Claudio Lucchese: From chatter to headlines: harnessing the real-time web for personalized news recommendation. WSDM 2012. 1st HPC Workshp - Claudio Lucchese

Can we exploit Twitter? • Timeliness • Personalization Number of mentions of “Osama Bin Laden” 1st HPC Workshp - Claudio Lucchese

News Get Old Soon • 90% of the clicks happen within 2 days from publication • Only a few occur early! 1st HPC Workshp - Claudio Lucchese

T.Rex (Twitter-based news recommendation system) • Builds a user model from Twitter • Signals from user generated content, social neighbors and popularity across Twitter and news • Entity-based representation (overcomes vocabulary mismatch) • Learn a personalized news ranking function: • Pick up candidates from a pool of related or popular fresh news,rank them and present top-k to the user 1st HPC Workshp - Claudio Lucchese

Recommendation Model • Ranking function is user and time dependent • Social model + Content model + Popularity model • Popularity model tracks entity popularity by the number of mentions in Twitter and news (with exponential forgetting) • Content model measures relatedness of a bag-of-entities representation of a users’ tweet stream and of a news article • Social model weights the content model of every social neighbor by a truncated PageRank on the Twitter network 1st HPC Workshp - Claudio Lucchese

System Overview • Designed to be streaming and lightweight (just counting) • User model is updated continuously 1st HPC Workshp - Claudio Lucchese

Learning the Weights • Learning to rank approach with SVM • Each time the user clicks on a news, we learn a set of preferences (clicked_news > non_clicked_news): • Prune the number of constraints for scalability: • only news published in the last 2 days • only take the top-k news for each ranking component • Can optionally include additional features for news articles: • click count, age, etc... (T.Rex+) 1st HPC Workshp - Claudio Lucchese

Predicting Clicked News • User generated content is a very good predictor albeit very sparse • Click Count is a strong baseline but does not help T.Rex+ 1st HPC Workshp - Claudio Lucchese

Predicting Clicked Entities 1st HPC Workshp - Claudio Lucchese

Future works (?) • Explain a set of news showing how the main topicsinteracted with each other over time. 1st HPC Workshp - Claudio Lucchese

Future works (?) • Explain a set of news showing how the main topicsinteracted with each other over time. • Example: European sovereign-debt crisis Fiscal Compact EuroBond Berlusconi Obama New Italiangovernment Monti Merkel Loan EU France Greece time 1st HPC Workshp - Claudio Lucchese

Future works (?) • Explain a set of news showing how the main topicsinteracted with each other over time. • Applications: • Given the news the user is currently reading, provide an explanation of the related facts that precede that news • Given a query, provide an explanation of the documents related to that query • Given a set of topics, explain their relations over time • Browse a collection of news, by changing the topics of interest, the time window, the granularity 1st HPC Workshp - Claudio Lucchese

Future works (?) • Explain a set of news showing how the main topicsinteracted with each other over time. • A topic is a named entity relevant over time • An interaction is a cluster of news related to some event and relevant in a small time window • It might be important to cover the given time window, but recent events might be more interesting 1st HPC Workshp - Claudio Lucchese

Future works (?) • Explain a set of news showing how the main topicsinteracted with each other over time. • Given a maximum number of main topics and interactions, maximize: • Topic coverage and diversity • Eventstime coverage • Cluster similarity • Main topicsconnectivity 1st HPC Workshp - Claudio Lucchese

Future works (?) • Explain a set of news showing how the main topicsinteracted with each other over time. • Its is different from news clustering: • Even if you had a good clustering, might not be trivial to select which events and which topicsto show in order to maximize the amount of information delivered to the user • There is some interesting related work • aimed at finding chains of news,we are more interested in topic evolution 1st HPC Workshp - Claudio Lucchese

Thank you ! 1st HPC Workshp - Claudio Lucchese

On Frequent Chatters Mining

On Frequent Chatters Mining

Presentation Transcript

Frequent Item Mining

Frequent Pattern Mining

Summarization of Frequent Pattern Mining

Frequent Structure Mining

Mining Frequent Patterns

Frequent Subgraph Mining

Data Mining: Concepts and Techniques Mining Frequent Patterns

Frequent Itemset Mining on Graphics Processors

Our New Progress on Frequent/Sequential Pattern Mining

Mining Frequent Subgraphs

Mining Frequent Subgraphs

Frequent Subgraph Pattern Mining on Uncertain Graph Data

Chapter 4 – Frequent Pattern Mining

Mining Compressed Frequent-Pattern Sets

Frequent Pattern Mining

Young CHATTERS

Mining Compressed Frequent-Pattern Sets

Our New Progress on Frequent/Sequential Pattern Mining