1 / 66

Harnessing the Relationships in Dynamic Networks

This dissertation proposal explores the relationships in dynamic networks through network mining techniques, such as link prediction, community finding, and attribute prediction. The proposed work aims to use community knowledge to assign roles to nodes in order to maximize influence in the network.

jdean
Télécharger la présentation

Harnessing the Relationships in Dynamic Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jerry Scripps Ph.D. Dissertation Proposal Harnessing the Relationships in Dynamic Networks

  2. Overview • Introduction • Background – network mining • Previous work • Finding overlapping communities in networks • Using community knowledge to assign roles to nodes • Proposed work • Data characteristics • Link prediction in dynamic networks • Attribute prediction in dynamic networks

  3. schroeder male married fine arts schroeder male married fine arts pigpen male single soil mgt pigpen male single soil mgt lucy female married business marcie female engaged psych. lucy female married business marcie female engaged psych. linus male engaged philos. pattie female single phy. ed. charles male engaged data min. linus male single philos. pattie female single phy. ed. charles male engaged data min. snoop male ? commun. woodstock ? single zoology snoop male ? commun. jim male single english woodstock ? single pre med Network Basics Node Link Community Attributes

  4. Dynamic NetworksExamples

  5. Network Mining Techniques • link-based classification (Chakrabarti ’98, Lu ’03) • link prediction (Liben-Nowell ’03, Al-Hasan ’06, Taskar ’03,…) • community finding (Newman ’04, Shi ’00, Kubica ’02,…) • ranking (Page ’98, Kleinberg ‘) • influence maximization (Kempe ’03, Domingos ’01)

  6. NetworksStatic and Dynamic t1 t2 t3 tk • link-based classification • link prediction • community finding • ranking • influence maximization

  7. Objective: Harness the relationships inherent in networks Nodes Attributes Communities Links

  8. Overview • Introduction • Background – link mining • Previous work • Finding overlapping communities in networks • Using community knowledge to assign roles to nodes • Proposed work • Data characteristics • Link prediction in dynamic networks • Attribute prediction in dynamic networks

  9. Community Finding [SDM 06] Arnold Arnold Jamie Lee George Dan Dan

  10. Choosing the Minimum Number of Nodes to Clone

  11. Finding Minimum is NP-Complete Hitting Set • Set S={a,b,c…} • C={A, B, C…} subsets of S For each subset create an ML path (i.e. {a, b, d}) with a CL edge Is there a subset S’ of S, where |S’|<k and S’ contains at least one element from every set in C? Finding Minimum # of Nodes to Clone b c a y x Is there a minimum number of nodes that when cloned will separate every pair of nodes involved in a CL link?

  12. Approximation Algorithms • Objective function:Q = 1|Eincomplete| + 2|Eimpure| + 3|Vadditional| • Graph Min-Cut [SDM 2006] • separate all Cannot Link edges • connected components become communities • Genetic Algorithms [TKDE] • Tr(XTLX)+λ3(||X||-n) • Binary Quadradic Programming problem

  13. Min-Cut Approach Find the min-cut of every CL pair and clone one of the nodes associated with each of the min-cut edges. O(VE2)

  14. Genetics Algorithm Solution mutation selection fitness function=λ1∙ |Eincomplete|+ λ2∙ |Eincomplete| + λ3∙ |Vadditional|

  15. Experimental Results

  16. Overview • Introduction • Background – link mining • Previous work • Finding overlapping communities in networks • Using community knowledge to assign roles to nodes • Proposed work • Data characteristics • Link prediction in dynamic networks • Attribute prediction in dynamic networks

  17. C C C ? L G D I H A C L B L L ? C E K J C F Incorporating Community Information [WEBKDD ’07, ICDM ’07] Influence Maximization Link-based Classification

  18. Typical Roles in Networks Role Metric Popularity degree Authority PageRank or HITS Influence influence maximization Centrality closeness/betweeness Community Role rawComm

  19. Community-Based Roles max Big Fish Ambassadors e e r g e D e v i t a l e R Loners Bridges 0 Community Metric 1 0

  20. Community metric rawComm 1/4 Στ = 3 1/4 1/4 1/4 1/3 A 1/2 1/3 1/2 1/3

  21. B E F C A G D Calculating Community Metric Στ = 3 Στ = 3 1/3 Assumption: link and community structures are aligned

  22. B E F C A G D Re-calculating Community Metric p = 0.9, q = 0.9 Στ = 2.73

  23. Comparison of rawComm to actualKarate data set • p=1, q=1 • SSE = 53.5 • p=0.73, q=0.82 • SSE = 6.68

  24. Movie Data FaceBook algorithm nodes coverage nodes group cov. time(ms) random 11 60 12 3.5 10 degree 19 83 21 3.7 60 greedy 22 259 23 4.1 4,920,000 comm 18 261 18 5.6 371 ambass 20 288 20 4.3 411 Application of Roles toInfluence Maximization

  25. Overview • Introduction • Background – link mining • Previous work • Finding overlapping communities in networks • Using community knowledge to assign roles to nodes • Proposed work • Data characteristics • Link prediction in dynamic networks • Attribute prediction in dynamic networks

  26. Characterization of Static Networks • Regular • Random • Small world • Scale-free • Models describe network generation that includes: • initial configuration (no links, regular graph or some initial random links) • link strategy that leads to the specific network type • with continued link addition eventually network becomes complete

  27. Characterization of Dynamic Networks • nodes: authors • links: coauthorship • communities: groups of conferences • 382,000 authors • 523,000 papers DBLP – bibliographic database of computer science articles spanning more than 30 years

  28. Characterization of Dynamic NetworksObjectives: • Identify/verify network and community structural invariants • existing: power law, clustering coefficient, average path length • other metrics: open triads, fringe nodes, etc. • study the changes in network/community type • Investigate the patterns of link attachment/detachment • effect of common neighbors, path length, degree on new links • effect of time, pair metrics on link removal • effect of community on link attachment/detachment • Study the patterns of community dynamics • node movement – fringe, outside fringe, outside network • growth and shrinkage of network

  29. Dynamic Network Characterization Process: synthetic and real data comparison association analysis network generator real network data faceBook DBLP Enron email compare and deduce relationships

  30. Dynamic Network Characterization Summary • Characterization of dynamic networks is an important problem but challenging: • many metrics, many time periods, many networks • finding trends in so much data is daunting • Use network generator and association analysis to • deduce network and community invariants • find patterns of link attachment / detachment • understand the impact that attributes, links and communities have on each other

  31. Overview • Introduction • Background – link mining • Previous work • Finding overlapping communities in networks • Using community knowledge to assign roles to nodes • Proposed work • Data characteristics • Link prediction in dynamic networks • Attribute prediction in dynamic networks

  32. Link PredictionProblem Definitions • Existing definitions: given a network of nodes, attributes and links: • predict the next most likely link • with some links missing, find the missing links • Applications: • recommending new friends or collaborators • finding missing hyper-links in web pages • Special case: • predict the links for a new node with no existing links • similar to cold-start problem in recommender systems • must rely on only the attributes

  33. Link PredictionExisting Solutions • Liben-Nowell and Kleinberg (2003) • ranked node-pairs and selected the top k • metrics used to rank node-pairs include: • graph distance • common neighbors • Jaccard, Adamic/Adar, pref. attachment • Katz, hitting time, PageRank, etc. • limitation: uses only link topology – not valid for new nodes

  34. Link PredictionExisting Solutions • Al Hasan, Chaoji, Salem, and Zaki (2006) • classifier approach • create an instance for each node pair • use attributes and link metrics for features • class is link: 1 for link, 0 for no link • limitation: reliability of training data, class skew features links

  35. Link PredictionExisting Solutions • probabilistic models • Taskar, Wong, Abbeel and Koller (2003) • Limitations: does not consider communities and cliques must be specified • Neville and Jensen (2005) • Limitations: dependencies between links, groups and attributes, joint prob. model rather than a conditional model of the links given the attributes

  36. Link Prediction for New NodesNaïve Approach • calculate similarity between nodes using traditional method such as cosine • rank the potential links and select the top k • problem: all attributes are weighted the same – some may not be valuable for predicting links

  37. New ApproachDefining the model • need a model that will weight the attributes to find the best alignment between the attributes and links • n x n adjacency matrix A: aij=1 when nodes i and j are linked, zero otherwise • d x d weight matrix W where W=diag(w1…wd) • n x d attribute matrix X where xi=attributes of node i

  38. Link Prediction for New NodesSolving for W can be written: then, create the vector Y and matrix Z as follows: the objective function becomes: which is a least squares regression model with solution:

  39. Link Prediction for New NodesExtensions • Heterogeneous communities • models can be trained on the different communities • new node can be assigned to community (ies) • links assigned based on the model for that (those) communities • Temporal data • historical data can be used to determine the number of new links for the new node • incorporate historical data into the model to learn the weights

  40. Link Prediction for New NodesSummary • link prediction is an important and growing area • existing solutions are inadequate for dynamic networks • weighted attribute matrix solution proposed: • finds the best alignment between attributes and links • can be efficiently solved • extendable in the temporal dimension • can also be extended to exploit heterogeneous nature of communities

  41. Overview • Introduction • Background – link mining • Previous work • Finding overlapping communities in networks • Using community knowledge to assign roles to nodes • Proposed work • Data characteristics • Link prediction in dynamic networks • Attribute prediction in dynamic networks

  42. Attribute PredictionProblem statement • problem: • link structure is known • most attributes of some nodes are known • want to predict the hidden attributes of one node • applications • discovering the hidden information about a terrorism suspect • inferring a customer’s preferences • recommender

  43. Attribute PredictionPrevious Work • link based classification can be used, one attribute at a time • Chakrabarti, Dom and Indyk (1998) • included class from neighboring nodes • limitation: not very effective • Lu and Getoor (2003) • created two separate models for links and attributes • more effective but still assumes alignment between attributes and links

  44. Attribute PredictionFirst Approach • Given a network with node oi with hidden attributes: • calculate weights using the link prediction model • create a system of linear equations: Ai=XWXiT • where aij, wj, and xkj are given values and the xi’s are variables when attribute i is hidden • when d<n, overdetermined system can be solved using least squares

  45. Attribute PredictionSecond Approach • Problem: with some attributes the corresponding weight could be small meaning that the links do not provide adequate information to predict the attribute • Solution: create a series of alternative link structures using pair-wise metrics original common neigh. original graph dist. Katz 0.5 0.1 0.2 0.7

  46. Attribute PredictionModel description • Create a number of alternate link structures: A1…Am • Use a vector of weights v, one for each Ai • Learn the weights using the objective function:where X is a matrix of attribute values and W = diag(w1…wd) , where attributes are selected by setting wi=1 • weights can be learned using least squares

  47. Attribute PredictionSummary • predicting attributes is an important and interesting problem • applications in criminal and social networks • predicting multiple variables in network setting is new • link-based classification can only predict one attribute at a time • proposed two solutions • simple linear equations model to predict attributes using weights learned from link prediction model • model to make use of alternate link structures • both models can be extended to use multiple adjacency matrices for temporal data

  48. Proposal Summary • Network data is amassing • Relationships between nodes, attributes, links and communities is complex but rich in meaning • Applications to harness these relationships will become valuable and necessary but there are many open issues • Understanding of the underlying forces of dynamic networks will become fundamental to creating the new applications

  49. It is all over • Thanks for your attention • Any questions?

  50. Additional Slides

More Related