Diffusion & Visualization in Dynamic Networks By James Moody Duke University

Diffusion & Visualization in Dynamic Networks By James Moody Duke University Thanks to Dan McFarland, Skye Bender-deMoll, Martina Morris, & the network modeling group at UW & the Social Structure Reading Group at OSU. Supported by NIH grants DA12831 and HD41877

What is a network? • “We will refer to the presence of regular patterns in relationship as structure.” • - Wasserman & Faust p.3 • As a description of the social network perspective: • 2) “Relational ties … between actors are channels for transfer or flow of resources” • 4) “Network models conceptualize structure … as lasting patterns of relations among actors.” • - Wasserman & Faust p.4 But how does a structural approach work when the patterns are transient?

When is a network? Source: Bender-deMoll & McFarland “The Art and Science of Dynamic Network Visualization” JoSS Forthcoming

When is a network? • At the finest levels of aggregation networks disappear, but at the higher levels of aggregation we mistake momentary events as long-lasting structure. • Is there a principled way to analyze and visualize networks where the edges are not stable? • There is unlikely to be a single answer for all questions, but the set of types of questions might be manageable: • Diffusion and flow (networks as resources or constraints for actors): • The timing of relations affects flow in a way that changes many of our standard measures. If our interest is in “Relational ties [as] channels for transfer or flow of resources” (W&F p.4), then we can use the diffusion process to shape our analyses. • Structural change (networks as dynamic objects of study). • The interest is in mapping changes in the topography of the network, to see model how the field itself changes over time. • Ultimately, this has to be linked to questions about how network macro-structures emerge as the result of actor behavior rules.

Network Dynamics & Flow • The key element that makes a network a system is the path: it’s how sets of actors are linked together indirectly. • A walk is a sequence of nodes and lines, starting and ending with nodes, in which each node is incident with the lines following and preceding it in a sequence. • A path is a walk where all of the nodes and lines are distinct. • Paths are the routes through networks that make diffusion possible. • In a dynamic network, the timing of edges affect the whether a good can flow across a path. A good cannot pass along a relation that ends prior to the actor receiving the good: goods can only flow forward in time. • A time-ordered path exists between i and j if a graph-path from i to j can be identified where the starting time for each edge step precedes the ending time for the next edge. • The notion of a time-ordered path must change our understanding of the system structure of the network. Networks exist both in relation-space and time-space.

Network Dynamics & Flow A time-ordered path exists between i and j if a graph-path from i to j can be identified where the starting time for each edge step precedes the ending time for the next edge. Note that this allows for non-intuitive non-transitivity. Consider this simple example: Here A can reach B, B can reach C, and C and reach D. But A cannot reach D, since any flow from A to C would have happened after the relation between C and D ended. 1 - 2 3 - 4 1 - 2 D A B C

Network Dynamics & Flow This can also introduce a new dimension for “shortest” paths: 3 - 4 B C 5 - 6 1 - 2 D A 5 - 6 7 - 9 E The geodesic from A to D is AE, ED and is two steps long. But the fastest path would be AB, BC, CD, which while 3 steps long could get there by day 5 compared to day 7.

Network Dynamics & Flow Direct Contact Network of 8 people in a ring

Network Dynamics & Flow Implied Contact Network of 8 people in a ring All relations Concurrent

Network Dynamics & Flow 2 3 2 1 1 2 2 3 = 0.57 reachability Implied Contact Network of 8 people in a ring Mixed Concurrent

Network Dynamics & Flow 1 8 2 7 3 6 5 4 = 0.71 reachability Implied Contact Network of 8 people in a ring Serial Monogamy (1)

Network Dynamics & Flow 1 8 2 7 3 6 1 4 = 0.51 reachability Implied Contact Network of 8 people in a ring Serial Monogamy (2)

t2 t2 t1 t1 t1 t1 t2 t2 Network Dynamics & Flow 1 2 2 1 1 2 = 0.43 reachability 1 2 Minimum Contact Network of 8 people in a ring Serial Monogamy (3)

Network Dynamics & Flow In this graph, timing alone can change mean reachability from 2.0 when all ties are concurrent to 0.43: a factor of ~ 4.7. In general, ignoring time order is equivalent to assuming all relations occur simultaneously – assumes perfect concordance across all relations. 1 2 2 1 1 2 1 2

Network Dynamics & Flow The distribution of paths is important for many of the measures we typically construct on networks, and these will be change if timing is taken into consideration: Centrality: Closeness centrality Path Centrality Information Centrality Betweenness centrality Network Topography Clustering Path Distance Groups & Roles: Correspondence between degree-based position and reach-based position Structural Cohesion & Embeddedness Opportunities for Time-based block-models (similar reachability profiles) In general, any measures that take the systems nature of the graph into account will differ.

Network Dynamics & Flow • New versions of classic reachability measures: • Temporal reach: The ij cell = 1 if i can reach jthrough time. • Temporal geodesic: The ij cell equals the number of steps in the shortest path linking i to j over time. • Temporal paths: The ij cell equals the number of time-ordered paths linking i to j. • These will only equal the standard versions when all ties are concurrent. • Duration explicit measures • 4) Quickest path: The ij cell equals the shortest time within which i could reach j. • 5) Earliest path: The ij cell equals the real-clock time when i could first reach j. • 6) Latest path: The ij cell equals the real-clock time when i could last reach j. • 7) Exposure duration: The ij cell equals the longest (shortest) interval of time over which i could transfer a good to j. • Each of these also imply different types of “betweenness” roles for nodes or edges, such as a “limiting time” edge, which would be the edge whose comparatively short duration places the greatest limits on other paths.

Network Dynamics & Flow Define time-dependent closeness as the inverse of the sum of the distances needed for an actor to reach others in the network.* Actors with high time-dependent closeness centrality are those that can reach others in few steps. Note this is directed. Since Dij =/= Dji (in most cases) once you take time into account. *If i cannot reach j, I set the distance to n+1

Network Dynamics & Flow Define fastness centrality as the average of the clock-time needed for an actor to reach others in the network: Actors with high fastness centrality are those that would reach the most people early. These are likely important for any “first mover” problem.

Network Dynamics & Flow Define quickness centrality as the average of the minimum amount of time needed for an actor to reach others in the network: Where Tjit is the time that j receives the good sent by i at time t, and Tit is the time that i sent the good. This then represents the shortest duration between transmission and receipt between i and j. Note that this is a time-dependent feature, depending on when i “transmits” the good out into the population. Note min is one of many functions, since the time-to-target speed is really a profile over the duration of t.

Network Dynamics & Flow Define exposure centrality as the average of the amount of time that actor j is at risk to a good introduced by actor i. Where Tijl is the last time that j could receive the good from i and Tiif is the first time that j could receive the good from i, so the difference is the interval in time when i is at risk from j.

Network Dynamics & Flow How do these centrality scores compare? Here I compare the duration-dependent measures to the standard measures on this example graph. Based only on the structure of the ties, not the timing, the most central nodes are nodes 13, 16 and 4. Since this is a simulation, I permute the observed time-ranges on this graph to test the general relation between the fixed and temporal measures.

Network Dynamics & Flow How do these centrality scores compare? Here I compare the duration-dependent measures to the standard measures on this example graph. Box plots based on 500 permutations of the observed time durations. This holds constant the duration distribution and the number of edges active at any given time.

Network Dynamics & Flow How do these centrality scores compare? Here I compare the duration-dependent measures to the standard measures on this example graph. Box plots based on 500 separate permutations of the start and end times. This changes the duration distribution and the number of edges active at any given time.

Network Dynamics & Flow How do these centrality scores compare? The “most important actors” in the graph depend crucially on when they are active. The correlations can range wildly over the exact same contact structure. The “centrality” scores described here are low-hanging fruit: simple extensions of graph-based ideas. But the crucial features for population interests will be creating aggregations of these features – something like “centralization” that captures the regularity, asymmetry and temporal role-structure of the network.

Network Dynamics & Flow How can we visualize such graphs? Animation of the edges, when the graph is sparse, helps us see the emergence of the graph, but diffusion paths are difficult to see: Consider an example: Romantic Relations at “Jefferson” high school

Network Dynamics & Flow How can we visualize such graphs? Animation of the edges, even when the graph is sparse, does not typically help us see the potential flow space, as it’s just too hard to follow the implication paths with our eyes, so it seems better to plot the implied paths directly. Consider an example: Plotting the reachability matrix can be informative if the graph has clear pockets of reachability:

Network Dynamics & Flow How can we visualize such graphs? Animation of the edges, even when the graph is sparse, does not typically help us see the potential flow space, as it’s just too hard to follow the implication paths with our eyes, so it seems better to plot the implied paths directly. Consider an example: Plotting the reachability matrix can be informative if the graph has clear pockets of reachability: (Good readability example)

Network Dynamics & Flow How can we visualize such graphs? Animation of the edges, even when the graph is sparse, does not typically help us see the potential flow space, as it’s just too hard to follow the implication paths with our eyes, so it seems better to plot the implied paths directly. Consider an example: Edges have discrete start and end times, tagged as days over a 2-year window: so first contact between nodes 10 and 4 was on day 40, last contact on day 72.

Network Dynamics & Flow How can we visualize such graphs? Animation of the edges, even when the graph is sparse, does not typically help us see the potential flow space, as it’s just too hard to follow the implication paths with our eyes, so it seems better to plot the implied paths directly. Consider an example: Here we plot the reachability matrix over the coordinates for the direct network. . Direct ties are retained as green lines, if node i can reach node j, then a directed arrow joins the two nodes. Here I mark cases where two nodes can reach each other with red, purely asymmetric with blue. This is accurate, but hard to read when reachability paths are long. (poor readability example)

Network Dynamics & Flow How can we visualize such graphs? Animation of the edges, even when the graph is sparse, does not typically help us see the potential flow space, as it’s just too hard to follow the implication paths with our eyes, so it seems better to plot the implied paths directly. Consider an example: Various weightings of the indirect paths also don’t help in an example like this one. Here I weight the edges of the reachability graph as 1/d, and plot using FR. You get some sense of nodes who reach many (size is proportional to out-reach). Here you really miss the asymmetry in reach (the correlation between number reached and number reached by is nearly 0).

Network Dynamics & Flow How can we visualize such graphs? Another tack is to shift our attention from nodes to edges, by plotting the line graph (thanks to Scott Feld for making this suggestion). The idea is to identify an ordering to the vertical dimension of the graph to capture the flow through the network. Consider an example: • So now we: • Convert every edge to a node • Draw a directed arc between edges that (a) share a node and (b) precede each other in time.

Network Dynamics & Flow How can we visualize such graphs? Another tack is to shift our attention from nodes to edges, by plotting the line graph (thanks to Scott Feld for making this suggestion). The idea is to identify an ordering to the vertical dimension of the graph to capture the flow through the network. Consider an example: • So now we: • Convert every edge to a node • Draw a directed arc between edges that (a) share a node and (b) precede each other in time. • Concurrent edges (such as {13-8 and 13-5} or {1-16,2-16} will be connected with a bi-directed edge (they will form completely connected cliques) while the remainder of the graph will be asymmetric & ordered in time.

Network Dynamics & Flow • Further Complications, that ultimately link us back to the question of • “When is a network” • Range of temporal activity • When the graph is globally sparse (like the example above), the path-structure will also be sparse. Increasing density will lead to lots of repeated interactions, and thus reachability cycles. • Consider email exchange networks or classroom communication networks vs. sexual networks. In sexual or romantic networks, returning to a partner once the relation has ended is rare, in communication networks it is common. • Observed vs. Real • - We will often have discrete observations of real-time processes. How do we account for between-wave temporal ordering? What are the limits of observed measures to such inter-wave activity? • - The Snijders et. al. Siena modeling approach is an obvious first step here.

Network Dynamics & Flow • Further Complications, that ultimately link us back to the question of • “When is a network” • 3) Temporal reachability as higher-order model feature • As the capacity of ERGM models continue to expand, we can start to build temporal sequence rules in to the local models (such as communication triplets, or avoidance of past relations once ended), which then makes it sensible to ask whether the models fit the time-structure of the data. • Optimal observation windows • Either for data collection or visualization, we often have to decide on a time-range for our analyses. What should that range be? • 5) Relational temporal asymmetry. For many types of relations, it is difficult to decide when relations end. This taps a distinction between activated and potential relations.

Diffusion & Visualization in Dynamic Networks By James Moody Duke University