1 / 32

A Framework For Community Identification in Dynamic Social Networks

A Framework For Community Identification in Dynamic Social Networks. Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee. Outline of Presentation. The Challenge: Dynamic Social Networks Framework and Problem Formulation Individual and Group Colorings

keelia
Télécharger la présentation

A Framework For Community Identification in Dynamic Social Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-WolfDavid KempePresented by Victor Lee

  2. Outline of Presentation • The Challenge: Dynamic Social Networks • Framework and Problem Formulation • Individual and Group Colorings • Group Coloring Heuristics • Experimental Results • Future Directions

  3. The Problem • Many well-known approaches to identify communities in social networks • Graph Partitioning • Clustering • Various measures of closeness or density • But, these approaches generally assume static networks • Most social networks are dynamic

  4. Dynamic Social Networks • Social Networks change over time • Membership changes • Interaction changes • Most community identification techniques: • Use a single snapshot • Or use time-averaged measurements • Lose important information

  5. Importance of Dynamic Information • Networks 1 and 2: same average characteristics,but… • Network 1 shows an oscillation • Network 2 suggests that C joins the community T1 T2 T3 T4 T5 T6 A B A B A B C A B A B A B time A B C A B C A B A B C A B C A B C Network 1 Network 2

  6. Proposal • New framework for modeling social networks over time • Algorithms and Heuristics to identify dynamic communities • Experiments to verify the concept and the computational performance

  7. Problem Formation • Given: • A set of individuals • A sequence of snapshot observations • Find: • A best-fit set of time-varying communities C(t) • Best-fit time-varying community membership for each individual • Approach: • Combinatorial optimization • Graph coloring

  8. Model: Individuals and Groups • Set of individuals X = {i1, i2, …in} • Sequence of observations <P1, P2, …PT> • Discrete time • Record interaction between individuals • The set of individuals interacting at time t define a group. • If A interacts with B, and B interacts with C,than {A,B,C} ⊆ a group A C B

  9. Group vs Community • Snapshot Graph • Individual is a vertex • Interaction is an edge • Group is a connected subgraph • Assumption: interaction is sufficiently limited so that the graph is not connected (we have disjoint groups) • Group ≠ Community • Groups capture observed interaction at a point in time • Communities extend over time

  10. Graphing the Observations • Each time slice is one observation • Edges within a time slice show observed interaction at time t • Add edges joining all observations of the same individual • No edges between groups from one time to another ○ = individual □ = group

  11. Refine the Problem • A community appears as a sequence of groups, of at most one group per time slice. • Tasks: • Assign each group to a community(color the group vertices) • Assign each individual to a community, for each time step (color individual vertices) • More Assumptions: • Individuals belong to one community at a time • Individuals don’t change community frequently • Individuals frequently appear in their community

  12. Cost Model • Quantify a “good” community identification • Assign costs to undesirable behavior: • I-cost:  when an individual changes color. • G-costs: • b1 when an individual is absent from its community. • b2 when an individual is present in a different community. • C-cost: g for each color that I uses • Find a coloring with minimum cost

  13. Coloring Choices and Costs At time T3, C temporarily changes its interaction. • Coloring 1: C changes community and then changes back. • Cost = 2*a (+ g if this color hasn’t been used before) • Coloring 2: C stays in its original community and just visits. • Cost = b1 + b2 • Optimal coloring depends on comparison (b1 + b2) < (2*a + g) or (2*a) T1 T2 T3 T4 A B C D A B C D A B C D A B C D A B C D A B C D time A B C D A B C D Coloring 1 Coloring 2

  14. Finding Optimal Colorings • Finding the optimal solution is NP-hard • Partition the problem: • Find an optimal set of communities • Find optimal assignment of individuals to communities • If Phase 1 (Group Coloring) is completed first: • Phase 2 is reduced from O(2N) to O(2G),N = # of individuals, G = # of groups • The cost incurred by one individual’s coloring is independent of the colors chosen by others.

  15. Independence of Individual Color Choice Proof: • Cost of an individual’s behavior = A (I-cost) + B (G-cost) + C * (C-cost) • Costs are assessed individually: • I-cost = a ∗ (# of color changes) • G-cost = b1∗ (# absences from its group) + b2∗ (# visits to other groups) • C-cost = g∗ (# of colors that an individual uses) • So, we can solve for each individual one at a time. • Moreover, we can assess cost incrementally,from time t to time t+1…

  16. Individual Coloring Algorithm • C = set of all colors observed to be used by an individual i • F(t) = {S ⊆ C: 1 ≤ |S| ≤ t} all possible subsets of colors up to time t • G(t,x) = G-cost to use color x at time t • I(t,x,y) = I-cost to use color x at time t-1 and color y at time t • C(x,R) = C-cost to use color x when color set R has been used Min. cost at time t, using color x, with color set S used: • At time=1: G(I, {x}, x) = G(1,x)At time=t: G(t, S, x) = G(t, x) + min [ G(t-1, R, y) + I(t, x, y) + C(x, R) ]over all R and y, where R ∈F(t-1), y ∈ R R U {x} = S, i-cost: changing colorg-cost: wrong groupc-cost: new color

  17. Optimal Individual Coloring • Given a group coloring, the minimum cost of coloring the individual I ismin G(T, S, x)S ∈F(T), x ∈ S • Time complexity is O( nT|C|2 2|C| ) • Space requirement is O( |C| 2|C| ) • If the number of groups |C| is not large, the complexity is tractable.

  18. A possible coloring Optimal Group Coloring • Determine the best mapping of groups at time t to groups at time t+1 • Groups that are mapped across time are part of the same community and have the same color • A coloring is good if most individuals can retain their color from step to step.

  19. Bipartite Matching Heuristic • Matching Graph • For each pair of groups g, g’ at times t, t’=t+1, add a weighted edge from vg,t to vg’,t’ • Weight = |g ∩ g’| (similarity of g to g’) • Find the maximum weight bipartite matching • Evaluation • Weights i-cost more than g-cost • Performs well if membership is fairly stable • No long range perspective • More efficient heuristics? i-cost: changing colorg-cost: wrong groupc-cost: new color

  20. Greedy Heuristics for Group Coloring • Approach: Maximize pairwise similarity between groups, for all pairs of groups over all timesteps • Jaccard’s index: Jac(g, g′) = | g ∩ g′| | g U g′| • Weighted for temporal proximity: JacD(g, g′) = Jac(g, g′) | t - t′ | overlap between g and g′, scaled to size of g and g′

  21. Greedy Heuristics for Group Coloring • Greedy Heuristic 1 (time is not a factor) • Construct a square similarity matrix of size |#groups| • Using agglomerative clustering • Greedy Heuristic 2 (look backwards in time) For t=1 to T do • Match most similar pairs g, g′ for any time t′ < t • If similarity=0 or all colors have been used, add a new color • Greedy Heuristic 3 (look back the shortest interval) • Like Heuristic 2, but use t′, t′ is the closest value to t such that ∃ similarity(g, g′) > 0

  22. Experiment 1: Verify the Framework • Does the framework capture the intuitive concept of dynamic community? • Procedure • Construct small, synthetic datasets • Use exhaustive search to get a truly optimal coloring

  23. At each time step, 1 member leaves and 1 enters a group, resulting in a complete membership change in 3 steps. (A) (a,b1,b2,g) =(1,0,1,1) (B) (a,b1,b2,g) = (1,0,3,1) Experiment 1A: “Assembly Line” • Results change as costs change. (A) favors stable membership. (B) allows for more fluid membership.

  24. 2, 3, and 4 are Children. 0 and 1 are Parents that visit a different child each timestep. (A) (a,b1,b2,g) =(1,0,1,1) (B) (a,b1,b2,g) = (1,0,3,1) Experiment 1B: “Dutiful Children” • Results: Framework succeeds at detecting the individual children as well as the visitation pattern.

  25. Experiment 2: Quality of Heuristic Results • Do the heuristics obtain colorings similar to those of an exhaustive search? • Procedure • Re-test the synthetic datasets using the various heuristics Results: At least one Heuristic method obtains the same coloring and total cost as Exhaustive Search

  26. Experiment 3: Real World Datasets • Do the framework and heuristics together obtain expected results using real-world datasets?

  27. Experiment 3A: “Southern Women” • Eighteen women in 1933 in Natchez, Tennessee • Tracks their attendance at 14 social events

  28. Experiment 3A: Prior Results • Twenty one analyses (1941 to 2001) all show similar results • Two clear communities • The membership of individuals 8, 9, and 16 is less certain.

  29. Detects 4 communities, which are subsets of the traditional 2 communities Individuals 6 and 10 change membership over time By adjusting cost factors, the results of most of the 21 prior analyses can be duplicated (a,b1,b2,g) =(1,1,1,1) Experiment 3A: Results

  30. 28-member zebra herd observed 44 times over 3 months in 2002 The graph to the left shows the aggregate interaction. Temporal information is lost. Experiment 3B: “Grevy’s Zebra”

  31. Inferred communities agree with manual results obtained by biologists. 4 stable communities Some short-lived communities and some visiting Experiment 3B: Results

  32. Conclusions • We present a framework for identifying communities in dynamic social networks • The framework produces meaningful results compared to traditional methods • Heuristic methods produce near-optimal solutions • Future Directions • Develop an approximation algorithm which guarantees the quality of the result • Investigate scalability over network size and time • Relax assumptions about interaction and dynamics

More Related