Mining Social Networks for Personalized Email Prioritization

MiningSocialNetworksforPersonalizedEmailPrioritization ShinjaeYoo,YimingYang,FrankLin,II-ChulMoon [KDD’09] Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2009/08/25

Outline • Introduction • SocialClustering • MeasuringSocialImportance • Semi-supervisedImportancePropagation • Experiments • ConclusionsandFuturework

Introduction • Email • Oneofthemostprevalentpersonalandbusinesscommunicationtools • Asynchronous • Process a large volume of email messages of differing importanceis BURDEN!

Introduction • Informationoverloadproblem • Needtodevelopsystemsthat automatically • learn personal priorities for each user • Identify personally interesting • Identify important messages for user’s attention

Introduction • Many statistical learning techniques have been studied in supportof Email-based prediction tasks • Spam identification, folder recommendation, recipient reminding, action-item identification, social group analysis • BUT, Personalized email prioritization • Remains an under-explored problem • Mainly due to privacy issues in collecting personal data

Introduction • This paper • Create a new collection of anonymized personal email data with importance levels • Proposed a fully personalized methodology for technical development and evaluation • Developed a supervised classification framework • For model personal priorities over messages, and predicting importance levels for new messages

Outline • Introduction • SocialClustering • MeasuringSocialImportance • Simi-supervisedImportancePropagation • Experiments • ConclusionsandFuturework

Motivation • Sender information • One of most indicative features • Messages sent by the members of the same group tend to share similar priority level • Capturing sender groups would be informative for predicting the importance of messages • If a sender who does not have any labeled instances • Based on unsupervised clustering, infer that user’s importance from other group members

Personalized Social Network • For each user, a personalized social network is • constructed by using the email data of that user • Practicality • Personalization • Email contact network • Represent by graph G=(V, E) • V: email contacts (users) • E: message sending among users, un-weighted (Eij=1 if there is a message from user i to user j,Eij=0 otherwise.)

Clustering • NewmanClustering • Beusedtosuccessfullyfindsocialstructures • Definesedge-betweenness • Alinkhasahighscoremeansthatthelinkiscrucialbetweentwoboundarynodesoftwoclusters • Deletelinkswithhighedge-betweennessscores,resultsindisconnectcomponentsasclusters A G E F J D I B L C H R

MeasuringSocialImportance • Linkrelationsprovidesusefulinformationaboutthecentralityofeachcontact

MeasuringSocialImportance • In-degreecentrality • Out-degreecentrality • Total-degreecentrality B E D A C

Measuring Social Importance • ClusteringCoefficient • Measureconnectivityamongtheneighborhoodofthenode • CliqueCount • Clique:fullyconnectedsub-graph • Alargecliquecountofnodevmeans • Itconnectstolargeandwell-connectedsub-graphs • Itislocatedinthecenterofthesub-graphs B E D A F C

Measuring Social Importance • Betweennesscentrality • Percentageofexistingshortestpathsoutofallpossiblepathsthatgoesthroughthenodev σjk:number of shortest path between j and k σjk(i):number of shortest path between j and k that goesthrough i

Measuring Social Importance • HITSAuthority • Hyperlink-Induced Topic Search, also known as Hubs and authorities • measurestheglobalimportanceofnode • Definition: Adjacency matrix XN-by-N, can be calculated by • Finding the principle eigenvectorr of matrix, where • r satisfies , • λis the largest eigenvalue

Measuring Social Importance • PCCAnalysis • Pearson Correlation Coefficient • Compute PCC of each social metric with human-labeled importance levels of email messages • Indicative about “How useful each metric for predicting the importance of email messages”

Semi-supervisedImportancePropagation • Semi-supervised Importance Propagation (SIP) • Propagate the importance values of labeled email messages (the training examples) to other messagesand corresponding contact persons

SIP Algorithm • Use a bipartite graph • to represent the interactions between email contacts and email messages • Let N= number of email contacts, M = number of messages • Using matrix to represent two types of edge, matrix A(N by M)and matrix B(N by M) • Ai,j=1 if person isends message j, and Ai,j=0 otherwise • Bi,j=1 if person ireceived message j, and Bi,j=0 otherwise

SIP Algorithm • Treat each importance label (1~5) as a category • Use vector(M by 1) to indicate the labels of message, • xk,i=1 if message i belongs to category k, xk,i=0 otherwise • Importance propagation frommessagestopersons (receivers) is calculated as • Importance propagation from persons (senders) to messages is calculated as

PropagationExample ????? 432?? • Messagestopersons (receivers) • Persons (senders)to messages

SIP Algorithm • Updating of the importance values for contact persons at each time step (t) is calculated by: ????? 432??

SIP Algorithm • is a linear transformation of • If is irreducible, and t is large stabilizes at the principal eigenvector of C • Irreducible property is not always guaranteed • If so, its principal eigenvector is insensitive to the starting vector

SIP Algorithm • A linear interpolation • Define , and normalize by sum of vector • Define importance-sensitive matrix • columns are identical, each column is equivalent to • Normalize matrix C to C’ • α = [0,1] • Ek is irreducible and importance-sensitive

SIP Algorithm • Finally, • SIP method is define iteratively as: () ( ) • Ek is irreducible , yk stabilizes when t is large • yk consists of the expected importance score of each person after iterative SIP

Experiments • Data • Recruited 25 experimental subjects • Each subjects was requested to label non-spam messages • Preprocessing • Email address canonicalization • Word tokenization and stemming • didn’t remove stop words from title and body text

Experiments • Features • Basic features are tokens in from, to, cc, title, and body text, use a v-dimensional vector to represent • Social-network based features • Use a m-dimensional sub-vector to represent NC features • Sub-vector (7-dims) to represent the social importance (SI) • 5-dimensional sub-vector to represent five SIP scores per user

Experiments • Classifiers • Use five linear SVM classifiers for prediction of importance level per email message • Use the standard SVMlight software package • Metric • N = number of messages • yi = the true importance level of message i • = the predicted importance level for that message

Experiments

Conclusions and Future Work • Future work • Collection of more data • from a larger number of users in a longer time period • Comparative study on • different clustering algorithms, and • graph-mining techniques with respect to effectiveness

Mining Social Networks for Personalized Email Prioritization

Mining Social Networks for Personalized Email Prioritization

Presentation Transcript

O pinion mining in social networks

ArnetMiner – Extraction and Mining of Academic Social Networks

Mining Structural Hole Spanners in Social Networks

Mining Triadic Closure Patterns in Social Networks

Privacy, Data Mining, and Social Networks

Personalized Privacy Protection in Social Networks

Mining social networks and their visual semantics from social photos

Personalized Influence Maximization on Social Networks

Regional Prioritization of Freight Networks

ArnetMiner: Extraction and Mining of Academic Social Networks

Location Mining from Online Social Networks

Application Aware Prioritization Mechanisms for On-Chip Networks

Neural networks for data mining

Mining Email Social Networks in OSS

A Scalable Solution for Personalized Recommendations in Large-scale Social Networks

Mining Social Network for Personalized Email Prioritization

Bayesian Networks for Data Mining

social networks for brokers

Semalt: Using Social Networks For Awesome Email Marketing Campaigns

Personalized Doctors Email Database

Mining Industry Email List

ArnetMiner – Extraction and Mining of Academic Social Networks