1 / 32

Mining Social Networks for Personalized Email Prioritization

This report explores a methodology for personalized email prioritization, leveraging social networks to determine the importance of messages. It introduces a supervised classification framework and proposes a semi-supervised importance propagation algorithm.

mmartini
Télécharger la présentation

Mining Social Networks for Personalized Email Prioritization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MiningSocialNetworksforPersonalizedEmailPrioritization ShinjaeYoo,YimingYang,FrankLin,II-ChulMoon [KDD’09] Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2009/08/25

  2. Outline • Introduction • SocialClustering • MeasuringSocialImportance • Semi-supervisedImportancePropagation • Experiments • ConclusionsandFuturework

  3. Introduction • Email • Oneofthemostprevalentpersonalandbusinesscommunicationtools • Asynchronous • Process a large volume of email messages of differing importanceis BURDEN!

  4. Introduction • Informationoverloadproblem • Needtodevelopsystemsthat automatically • learn personal priorities for each user • Identify personally interesting • Identify important messages for user’s attention

  5. Introduction • Many statistical learning techniques have been studied in supportof Email-based prediction tasks • Spam identification, folder recommendation, recipient reminding, action-item identification, social group analysis • BUT, Personalized email prioritization • Remains an under-explored problem • Mainly due to privacy issues in collecting personal data

  6. Introduction • This paper • Create a new collection of anonymized personal email data with importance levels • Proposed a fully personalized methodology for technical development and evaluation • Developed a supervised classification framework • For model personal priorities over messages, and predicting importance levels for new messages

  7. Outline • Introduction • SocialClustering • MeasuringSocialImportance • Simi-supervisedImportancePropagation • Experiments • ConclusionsandFuturework

  8. Motivation • Sender information • One of most indicative features • Messages sent by the members of the same group tend to share similar priority level • Capturing sender groups would be informative for predicting the importance of messages • If a sender who does not have any labeled instances • Based on unsupervised clustering, infer that user’s importance from other group members

  9. Personalized Social Network • For each user, a personalized social network is • constructed by using the email data of that user • Practicality • Personalization • Email contact network • Represent by graph G=(V, E) • V: email contacts (users) • E: message sending among users, un-weighted (Eij=1 if there is a message from user i to user j,Eij=0 otherwise.)

  10. Clustering • NewmanClustering • Beusedtosuccessfullyfindsocialstructures • Definesedge-betweenness • Alinkhasahighscoremeansthatthelinkiscrucialbetweentwoboundarynodesoftwoclusters • Deletelinkswithhighedge-betweennessscores,resultsindisconnectcomponentsasclusters A G E F J D I B L C H R

  11. Outline • Introduction • SocialClustering • MeasuringSocialImportance • Semi-supervisedImportancePropagation • Experiments • ConclusionsandFuturework

  12. MeasuringSocialImportance • Linkrelationsprovidesusefulinformationaboutthecentralityofeachcontact

  13. MeasuringSocialImportance • In-degreecentrality • Out-degreecentrality • Total-degreecentrality B E D A C

  14. Measuring Social Importance • ClusteringCoefficient • Measureconnectivityamongtheneighborhoodofthenode • CliqueCount • Clique:fullyconnectedsub-graph • Alargecliquecountofnodevmeans • Itconnectstolargeandwell-connectedsub-graphs • Itislocatedinthecenterofthesub-graphs B E D A F C

  15. Measuring Social Importance • Betweennesscentrality • Percentageofexistingshortestpathsoutofallpossiblepathsthatgoesthroughthenodev σjk:number of shortest path between j and k σjk(i):number of shortest path between j and k that goesthrough i

  16. Measuring Social Importance • HITSAuthority • Hyperlink-Induced Topic Search, also known as Hubs and authorities • measurestheglobalimportanceofnode • Definition: Adjacency matrix XN-by-N, can be calculated by • Finding the principle eigenvectorr of matrix, where • r satisfies , • λis the largest eigenvalue

  17. Measuring Social Importance • PCCAnalysis • Pearson Correlation Coefficient • Compute PCC of each social metric with human-labeled importance levels of email messages • Indicative about “How useful each metric for predicting the importance of email messages”

  18. Outline • Introduction • SocialClustering • MeasuringSocialImportance • Semi-supervisedImportancePropagation • Experiments • ConclusionsandFuturework

  19. Semi-supervisedImportancePropagation • Semi-supervised Importance Propagation (SIP) • Propagate the importance values of labeled email messages (the training examples) to other messagesand corresponding contact persons

  20. SIP Algorithm • Use a bipartite graph • to represent the interactions between email contacts and email messages • Let N= number of email contacts, M = number of messages • Using matrix to represent two types of edge, matrix A(N by M)and matrix B(N by M) • Ai,j=1 if person isends message j, and Ai,j=0 otherwise • Bi,j=1 if person ireceived message j, and Bi,j=0 otherwise

  21. SIP Algorithm • Treat each importance label (1~5) as a category • Use vector(M by 1) to indicate the labels of message, • xk,i=1 if message i belongs to category k, xk,i=0 otherwise • Importance propagation frommessagestopersons (receivers) is calculated as • Importance propagation from persons (senders) to messages is calculated as

  22. PropagationExample ????? 432?? • Messagestopersons (receivers) • Persons (senders)to messages

  23. SIP Algorithm • Updating of the importance values for contact persons at each time step (t) is calculated by: ????? 432??

  24. SIP Algorithm • is a linear transformation of • If is irreducible, and t is large stabilizes at the principal eigenvector of C • Irreducible property is not always guaranteed • If so, its principal eigenvector is insensitive to the starting vector

  25. SIP Algorithm • A linear interpolation • Define , and normalize by sum of vector • Define importance-sensitive matrix • columns are identical, each column is equivalent to • Normalize matrix C to C’ • α = [0,1] • Ek is irreducible and importance-sensitive

  26. SIP Algorithm • Finally, • SIP method is define iteratively as: () ( ) • Ek is irreducible , yk stabilizes when t is large • yk consists of the expected importance score of each person after iterative SIP

  27. Outline • Introduction • SocialClustering • MeasuringSocialImportance • Semi-supervisedImportancePropagation • Experiments • ConclusionsandFuturework

  28. Experiments • Data • Recruited 25 experimental subjects • Each subjects was requested to label non-spam messages • Preprocessing • Email address canonicalization • Word tokenization and stemming • didn’t remove stop words from title and body text

  29. Experiments • Features • Basic features are tokens in from, to, cc, title, and body text, use a v-dimensional vector to represent • Social-network based features • Use a m-dimensional sub-vector to represent NC features • Sub-vector (7-dims) to represent the social importance (SI) • 5-dimensional sub-vector to represent five SIP scores per user

  30. Experiments • Classifiers • Use five linear SVM classifiers for prediction of importance level per email message • Use the standard SVMlight software package • Metric • N = number of messages • yi = the true importance level of message i • = the predicted importance level for that message

  31. Experiments

  32. Conclusions and Future Work • Future work • Collection of more data • from a larger number of users in a longer time period • Comparative study on • different clustering algorithms, and • graph-mining techniques with respect to effectiveness

More Related