770 likes | 915 Vues
Flow Processes and the Structural Importance of Nodes. Mohamed Atta. Steve Borgatti Boston College. Data courtesy of Valdis Krebs. Attacking Terrorist Nets. Find and eliminate structurally important nodes and lines bridges, cut-points; minimum weight cutsets measures of centrality
E N D
Flow Processes and the Structural Importance of Nodes Mohamed Atta Steve BorgattiBoston College Data courtesy of Valdis Krebs
Attacking Terrorist Nets • Find and eliminate structurally important nodes and lines • bridges, cut-points; minimum weight cutsets • measures of centrality • closeness, betweenness, eigenvector, etc.
Usman Bandukra Terrorist Network Djamal Beghal Essid Sami Ben Khemais Mohamed Atta Mamoun Darkazanli Nawaf Alhazmi Raed Hijazi Data courtesy of Valdis Krebs
Many Problems • Data not good enough • Mostly known after an event • Sensitive to error • Benefits are short-term at best • Must address recruitment, training • it is precisely those organizations that make heavy use of suicide bombers that are organized as networks
Was al Qaeda incapacitated by removal of 19 hijackers? Dead Alive? Data courtesy of Valdis Krebs
One Additional Problem • Centrality measures make certain assumptions about how things flow • and may produce poor estimates when misapplied • need to work that out before deciding which node to remove
Objective • Enumerate kinds of flow processes • Analyze properties • Relate to structural importance of nodes • Relate to existing measures of centrality
Gift process Currency process Transport process Postal process Gossip process E-mail process Infection process Influence process Types of Flow Processes (several others)
Gift Process • Canonical example: • passing along used paperback novel • Single object in only one place at a time • Doesn’t travel between same pair twice • Could be received by the same person twice • A--B--C--B--D--E--B--F--C ...
Currency Process • Canonical example: • specific dollar bill moving through the economy • Single object in only one place at a time • Can travel between same pair more than once • A--B--C--B--C--D--E--B--C--B--C ...
Gossip Process • Example: • juicy story moving through informal network • Multiple copies exist simultaneously • Person tells only one person at a time* • Doesn’t travel between same pair twice • Can reach same person multiple times * More generally, they tell a very limited number at a time.
E-Mail Process • Example: • forwarded jokes and virus warnings • e-mail viruses themselves • Multiple copies exist simultaneously • All (or many) connected nodes told simultaneously (except the immediate source?)
Influence Process • Example: • attitude formation • Multiple “copies” exist simultaneously • Multiple simultaneous transmission, even between the same pairs of nodes
Infection Process • Example: • virus which activates effective immunological response • Multiple copies may exist simultaneously • Cannot revisit a node • A--B--C--E--D--F...
Postal Process • Example: • package delivered by postal service • Single object at only one place at one time • Map of network enables the intelligent object to select only the shortest paths to all destinations
Uncovering Flow Properties • Take componential analysis approach • identify a set of flow processes • compare and contrast to discover minumum set of attributes (properties) that distinguish them from each other • view each distinct flow process as unique bundle of properties -- typology
Properties of Flow Processes • Sequence type: path, trail, walk • path: can’t revisit node nor edge (tie) • trail: can revisit node but not edges • walk: can revisit edges & nodes • Deterministic vs non-deterministic • blind vs guided • always chooses best route; aware of map • Combine into 4-way “pattern” property: • geodesics, paths, trails, walks
Properties -- cont. • Duplication vs transfer (copy vs move) • transfer/move: only one place at one time • duplication/copy: multiple copies exist • Serial vs parallel duplication • serial: only one transmission at a time • parallel: broadcast to all surrounding nodes • Combine into “method” 3-way property: • parallel dup., serial dup., transfer
Simplified Typology goods information
So What? • The properties of a flow process (together w/ node position) determine which nodes are structurally important • a node that is important in one process is not important in another • off-the-shelf centrality measures implicitly assume certain flow properties and are only interpretable for certain flow processes (ala Friedkin)
T L M P X Q S Closeness Centrality • A node’s centrality is sum of geodesic distances to all others. • Length of shortest paths • Is index of expected time until arrival of that-which-flows for consistent processes: • non-deterministic (e.g., postal) • parallel duplication (e.g., e-mail, nameserver)
Calculating Closeness Centrality How long does a token take to reach a node?
T L M P X Q S Betweenness Centrality • Count no. of geodesic paths from each node to every other node that pass through X • if there is more than one geo-desic from S to T, count the prop-ortion that pass through X • Interpret as • how often node utilized by others • potential for control & synthesis
Betweenness Flow Processes • Consistent processes • postal process • Nearly consistent • parallel processes (all routes at same time) • but ... needn’t choose between geodesics • Implication • better for modeling transportation of goods than information
Calculating Betweenness Centrality How often does a token pass through a node?
Row sumsof kAk 1A1 2A2 3A3 A ΣkAk + + + + = ... Eigenvector Centrality • Eigenvector of adjacency matrix • in effect, counts number of walks of all lengths emanating from node, weighted inversely by length • Interpreted as popularity or being in the thick of things • Assumes flow can return to same nodes & lines k
“Cross-Platform” Centrality • How far off are these centrality measures when used with wrong flow process? • How can we correctly measure closeness and betweenness concepts in different flow contexts? • Simulation modeling
Realized Centrality • Essence of closeness is the expected time until arrival of fluenda • realized closeness is an empirical measurement of the avg time until arrival • Freeman closeness is an estimator of this • model-based formula that should correspond to actual longterm values if the model fits • Betweenness is expected number of times a fluendum passes through node
Simulation Procedure (for deterministic flow processes) • For each of 10,000 trials* ... • For each node, • let token originate at the node & propagate according to flow process rules until it can go no further • record which nodes are visited along way and # of units of time needed to arrive at each node for first time • Cumulate realized closeness and realized betweenness *NOTE: Parallel processes only require 1 trial -- no randomness
Simulation Procedure (for non-deterministic processes) • For each of 10,000 trials ... • For each ordered pair of (source,target) nodes • let token originate at source node & propagate according to flow process rules until it either reaches target node or can go no further • record which nodes are visited and # of units of time needed to arrive at each node for first time • Cumulate realized closeness and realized betweenness
Alternative Methods • Can use non-deterministic procedure on all processes, for comparability to Freeman betweenness • numerical results quite different • but larger conclusions are the same • But, logically, not sensible • Freeman’s dyadic method presupposes source & target • i.e., non-deterministic process
Empirical Results • Compare realized closeness & betweenness with Freeman measures across different flow processes • Dataset is known ties among terrorists compiled by Valdis Krebs • Start with betweenness
Betweenness in Postal Proc. (all the rest are zeros on both measures)
Sequential duplication across trails: rumors Scores standardized to =0, =1 ranks scores Betweenness / Gossip Process
Betweenness in Gossip Proc. Under-estimated by betweenness centrality Over-estimated by betweenness centrality Over-estimated by betweenness centrality Freeman measure is zero when contacts are connected Token rarely gets to 46, so its realized betweenness cannot be as high as the Freeman measure estimates Data courtesy of Valdis Krebs
Path redundancy Individual performance Type of flow Blind vs Guided Flows • Nodes embedded in dense regions are more important in blind processes than in nondeterministic processes. • It is in blind processes that we see bottling-up phenom. that Granovetter alludes to
Physical transfer along trails: used paperback Scores standardized to =0, =1 Betweenness in Gift Process
Sequential duplication across trails: rumors Scores* standardized to =0, =1 Correlation is high -- much better than betweenness corr ranks scores Closeness in Gossip Process
Closeness in Gossip Process Under-estimated by closeness centrality Colors based on average arrival times Over-estimated by closeness centrality Data courtesy of Valdis Krebs In gossip process, token gets bottled up by dense regions, takes long time to escape to other groups. Hard for blind process to find way out.
Lack of Symmetry • In many processes, avg distance to node does not equal distance from the node • even though network is symmetrical • People who can reach others in few steps are NOT the same as people who can be reached by others in few steps • Freeman closeness uncorrelated w/ former
Asymmetry Due to Degree Variance To “Distance” Matrix From
Lack of Computability • Closeness in Gift Process • Gift gets stuck in cul-de-sac, resulting in infinite time/distance • Can’t compute expected time til arrival
Summary • Variety of flow processes • Distinguished by a system of properties • Key properties include • blind / guided • copy / move • serial / parallel • path / trail / walk