An Introduction to Social Network Analysis James Moody Department of Sociology The Ohio State University
Craig Calhoun Isaias Afworki Introduction The world we live in is connected: Jim Moody
Introduction These patterns of connection form a social space. Social network analysis maps and analyzes this social space.
Introduction Yet standard social science analysis methods do not take this space into account. Moreover, the complexity of the relational world makes it impossible (in most cases) to understand this connectivity using only our intuitive understanding of a setting.
Introduction • Why networks matter: • Intuitive: information travels through contacts between actors, which can reflect a power distribution or influence attitudes and behaviors. Our understanding of social life improves if we account for this social space. • Less intuitive: patterns of inter-actor contact can have effects on the spread of “goods” or power dynamics that could not be seen focusing only on individual behavior.
Introduction • Social network analysis is: • a set of relational methods for systematically understanding and identifying connections among actors • a body of theory relating to types of observable social spaces and their relation to individual and group behavior.
Introduction • Network analysis assumes that: • How actors behave depends in large part on how they are linked together • Example: Adolescents with peers that smoke are more likely to smoke themselves. • The success or failure of organizations may depend on the pattern of relations within the organization • Example: The ability of companies to survive strikes depends on how product flows through factories and storehouses. • (continued…..)
Introduction Network analysis assumes that: • Patterns of relations reflect the power structure of a given setting, and clustering may reflect coalitions within the group • Example: Overlapping voting patterns in a coalition government
Introduction An information network: Email exchanges within the Reagan white house, early 1980s (source: Blanton, 1995)
Introduction Power positions and potential influence
Introduction Basic Concepts Flows within Networks Structure of Social Space Tools, Models & Methods For Flows and Structures Conclusions Overview
Basic Concepts • Actors are nodes • Ideas, Papers, Events, Individuals, • Organizations, Nations • Relations are lines between pairs of nodes • Symmetric (shares a room with) • Asymmetric (gives an order to) • Valued (number of times seen together)
Basic Concepts • Network data are familiar to you • For example: • - Personal, face-to-face contact • - Telephone contact • - Email contact • - Contact through faxes or wires • - Snail-mail contact • - Membership in the same organization • - Attendance at the same meetings • - Graduates of the same university
Basic Concepts For example, you might be tracking the activities of a number of people in related, but not identical cases, including meetings they attended. You may know little of the content of the event, or what they may have said to each other, only whether particular people were at the event. Your data might look like:
Basic Concepts 11.19.2001. Meeting at Brussels. Attending: Smith, Johnson, Davis, James, Jackson 12.22.2001. Meeting at Paris. Attending: Johnson, James, Jones, Wilson 1.12.2001. Meeting in New York. Attending: Jones,Carter, Burns 2.14.2001. Meeting in Denver. Attending: Wilson, Burns, Wilf, Newman (Red bold indicates people who are the focus of an investigation)
Newman Wilson Johnson Smith Jackson Wilf Jones James Burns Davis Carter Basic Concepts While perhaps not immediately apparent when looking at the list of names, a simple algorithm reveals connections among these actors.
Basic concepts • Types of network data: • 1) Ego-network • - Have data on a respondent (ego) and the people they are connected to (alters) • - May include estimates of connections among alters
Basic concepts • Types of network data: • 2) Partial network • - Ego networks plus some amount of tracing to reach contacts of contacts • - Something less than full account of connections among all pairs of actors in the relevant population • - Example: CDC Contact tracing data for STDs
Basic concepts • Types of network data: • 3) Complete • - Data on all actors within a particular (relevant) boundary • - Never exactly complete, but boundaries are set • - Example: Coauthorship data among all writers in the social sciences
Contact’s contact Alter Relation Trace Relation Examples: linked levels of data Actor Key contact Primary Relation
Why networks matter: Consider the following (much simplified) scenario: • Probability that actor i passes information to actor j (pij)is a constant over all relations = 0.6 • S & T are connected through the following structure: S T • The probability that S passes the information to T through either path would be: 0.09
Why networks matter: Now consider the following (similar?) scenario: S T • Every actor but one has the exact same number of contacts • The category-to-category mixing is identical • The distance from S to T is the same (7 steps) • S and T have not changed their behavior • Their contacts’ contacts have the same behavior • But the probability of the information passing from S to T is: • = 0.148 • Different outcomes & different potentials for intervention
Overview Introduction Basic Concepts Flows within Networks Structure of Social Space Tools, Models & Methods For Flows and Structures Conclusions
Network Flow • In addition to the simple probablity that one actor passes information on to another (pij), two factors affect flow through a network: • Topology • the shape, or form, of the network • - Example: one actor cannot pass information to another unless they are either directly or indirectly connected • Time • - the timing of contact matters • - Example: an actor cannot pass information he has not receive yet
Topology Two features of the network’s shape are known to be important: connectivity and centrality • Connectivity refers to how actors in one part of the network are connected to actors in another part of the network. • Reachability: Is it possible for actor i to reach actor j? This can only be true if there is a chain of contact from one actor to another. • Distance: Given they can be reached, how many steps are they from each other? • Number of paths: How many different paths connect each pair?
Network topology: reachability Without full network data, you can’t distinguish actors with limited information from those more deeply embedded in a setting. c b a
Network topology: distance & number of paths • Given that ego can reach alter, distance determines the likelihood of information passing from one end of the chain to another. • Because information spread is never certain, the probability of transfer decreases over distance. • However, the probability of transfer increases with each alternative path connecting pairs of people in the network.
Network topology: distance & number of paths Distance is measured by the (weighted) number of relations separating a pair: Actor “a” is: 1 step from 4 2 steps from 5 3 steps from 4 4 steps from 3 5 steps from 1 a
Network topology: distance & number of paths Paths are the different routes one can take. Node-independent paths are particularly important. There are 2 independent paths connecting a and b. b There are many non-independent paths a
Probability of information transfer by distance and number of paths, assume a constant pij of 0.6 1.2 1 10 paths 0.8 5 paths probability 0.6 2 paths 0.4 1 path 0.2 0 2 3 4 5 6 Path distance
Reachability in Colorado Springs (Sexual contact only) • High-risk actors over 4 years • 695 people represented • Longest path is 17 steps • Average distance is about 5 steps • Average person is within 3 steps of 75 other people • 137 people connected through 2 independent paths, core of 30 people connected through 4 independent paths (Node size = log of degree)
Network topology: centrality • Centrality refers to (one dimension of) location, identifying where an actor resides in a network. • For example, we can compare actors at the edge of the network to actors at the center. • In general, this is a way to formalize intuitive notions about the distinction between insiders and outsiders.
Centrality example: At the local level, we expect people like NSJMP and NSOLN to have greater access to information than others in the network. Network analysis gives us a set of tools to quantify this difference.
Centrality example: Actors that appear very different when seen individually, are comparable in the global network. (Node size proportional to betweenness centrality )
Information flows • Two factors that affect network flows: • Topology • - the shape, or form, of the network • - simple example: one actor cannot pass information to another unless they are either directly or indirectly connected • Time • - the timing of contacts matters • - simple example: an actor cannot pass information he has not receive yet
Timing in networks • A focus on contact structure often slights the importance of network dynamics • Time affects networks in two important ways: • 1) The structure itself goes through phases that are correlated with information spread • 2) The timing of contact constrains information flow
Sexual Relations among A syphilis outbreak Changes in Network Structure Rothenberg et al map the pattern of sexual contact among youth involved in a Syphilis outbreak in Atlanta over a one year period. (Syphilis cases in red) Jan - June, 1995
Sexual Relations among A syphilis outbreak July-Dec, 1995
Sexual Relations among A syphilis outbreak July-Dec, 1995
Drug Relations, Colorado Springs, Year 1 Data on drug users in Colorado Springs, over 5 years
Drug Relations, Colorado Springs, Year 2 Current year in red, past relations in gray Data on drug users in Colorado Springs, over 5 years
Drug Relations, Colorado Springs, Year 3 Current year in red, past relations in gray Data on drug users in Colorado Springs, over 5 years
Drug Relations, Colorado Springs, Year 4 Current year in red, past relations in gray Data on drug users in Colorado Springs, over 5 years
Drug Relations, Colorado Springs, Year 5 Current year in red, past relations in gray Data on drug users in Colorado Springs, over 5 years
What impact does timing have on flow through the network? In addition to changes in the shape over time, contact timing constrains how information can flow through the network. Consider the following example:
A hypothetical contact network 8 - 9 E C 3 - 7 2 - 5 A B 0 - 1 3 - 5 D F Numbers above lines indicate contact periods