Fundamental Building Blocks of Social Structure

Fundamental Building Blocks of Social Structure Honoring Peter Killworth’s contribution to social network theory Southampton, Sept. 28, 2006

The network scale-up team • Peter D. Killworth (SOC) • Christopher McCarty (U Florida) • Gene A. Shelley (Georgia State U) • Eugene Johnsen (UC-Santa Barbara) • H. Russell Bernard (U Florida)

Some background: “I’ll have a go at that” (Scripps, 1972). • I asked everyone on a ship to rank order their interactions with all the others. • I came to the physics department coffee break and asked "anybody here want to know the social structure of a vessel that gets all your data?" • The ocean-going physicists in the room knew they weren't supposed to talk to people like me and didn't even look up.

Peter hadn’t gotten the memo about social scientists and said he thought it might be fun. • And that’s what it’s been, for 34 years and 40-odd papers later …

How to get at the structure of these data? “Let’s try this …” • Peter applied an algorithm from F.S. Acton’s (then) recent book “Numerical Methods that (Usually) Work” ... • The algorithm had been developed to solve the a traffic problem: How to get from point A to point B fastest, irrespective of the number of red lights on the path. • Visualizing the messy result.

The prison studies • We combined numerical methods with ethnography. • The cliques always made sense, until one day … • Three numerically tied inmates whose connections made no apparent sense: different crimes, North and South, rural and urban, Black and White. • Finally, finally an artifact ….

Peter: “This is too easy.” • We discovered that physicists don’t apply their models to social structure and anthropologists don’t test the error bounds of their instruments. • We were half-way on this one, so we started the accuracy studies.

How to study accuracy? • We studied people whose real communication could be unobtrusively monitored and whose members we could ask questions like: "So, in the last [day], [week], [month], who did you talk to in this group?" • Deaf people on TTYs • Ham radio operators in a local network • An early e-mail group • An office • A fraternity

Half of what people tell you is incorrect • People don’t recall behaviors that did occur and recall behaviors that didn't occur. • People aren’t lying. They’re just terrible behaviorscopes.

Extending (or redefining) the problem • We asked: are the instruments for gathering data about human behavior producing accurate measurements of human behavior? • Others used our data and asked: what do those instruments produce a valid measurement of? • Answer: If you ask people who they interact with, people retrieve who they usually interact with and report who they ought to interact with, given everything they already know about their place in the social structure.

Next, the small world… • Milgram’s famous small-world experiment told us that there are 5.5 links between any two white people in the U.S. and exactly one more link between any white and any black person in the U.S. • But these numbers do not tell us anything about the structure of the society.

Peter: “Let’s find out how the SW actually operates” • Show people a list of SW targets, complete with the information about location, occupation, hobbies, and organizations. • ask people to tell us their first link in a small-world experiment. • Repeat 500 times and analyze the information needed by people to make their choice of a first link.

The reverse small world experiments • We ran six of these experiments in the U.S., in Micronesia and in Mexico. • Things that people in the US find useful to the task (name, location, occupation, hobbies, organizations) are the same things that people in other cultures need to know to place someone in their network. • For both of us, the cross-cultural regularity discovered in this series of experiments is among the most exciting results of our work.

We created a similarity matrix between targets: how many people used the same choice for a given pair of targets? • A 2-d MDS shows the enduring influence of Gerhard Mercator on schooling.

Finding the distribution c • Our real objective, though, is to understand the basic components of social structure. • One quantity that seems important is the number of people whom people know. • We call this c

Network size … “It’s just one number” • From the first, Peter pushed us all to learn more about the basic quanta: • How does network size vary, within and across cultures? • What’s the distribution look like? • Our first estimate, in 1978, for average network size in the U.S. was 250.

Peter: “You have to start somewhere.” • And what was that 250? • It was the number of people on whom the people of Morgantown, West Virginia who sat through this grueling, 8-hour experiment could call on to be first links if Milgram had shown up and asked them to participate in a small world experiment.

Deriving c from an assumption • Let t be the size of a population, and let e be the size of some subpopulation within it. • We assume that the fractional size p = e/t of that subpopulation also applies to any individual’s network, other things being equal. That is, everyone’s network in a society reflects the distribution of subpopulations in that society.

The scale-up method to estimate c • To test this, we ask a representative sample of people to tell us how many people they know in many subpopulations whose sizes are known: • e.g., diabetics, gun dealers, postal workers, women named Nicole, men named Michael

People answer accurately • Now, assuming that people can and do answer our question accurately

A maximum likelihood estimate of an individual’s network size: where there are L known subpopulations. (Here i is the individual, who knows mij in subpopulation j.) Network size is (the sum of all the people you say you know in some subpopulations of known size, divided by the total size of those subpopulations) times the population within which the subpopulations are embedded.

The estimates of c are reliable • This doesn’t deal with the big IF, but across 7 surveys in the U.S., average network size = 290 (sd 232, median 231). • The 290 is not an average of averages. It’s a repeated finding. • And it’s almost certainly not an artifact of the method.

Reliability I: • In one survey, we estimated c by asking people how many people they know in each of 17 relation categories – people who are in their immediate family, people who are co-workers, people who provide a service – and summing. • The summation method (due to Chris McCarty) produced a mean for c of 290.

Reliability II: Change the data • We changed reported values at or above 5 to a value of 5 precisely. The mean dropped to 206, a change of 29%. • We set values of at least 5 to a uniformly distributed random value between 5 and 15. We repeated this random change only for large subpopulations (with > 1 million). • The mean increased to 402, a change of 38% -- in the opposite direction.

Reliability III: Survey clergy • We surveyed a national sample of 159 members of the clergy – people who are widely thought to have large networks. • Mean c = 598 for the scale-up method • Mean c = 948 for the summation method

290 is not a coincidence • 1. Two different methods of counting produce the same result. • 2. Changing the data produces large changes in the results. • 3. People who are widely thought to have large networks do have large networks.

Something is going on • This next slide shows the probability, for two of our surveys, of knowing no one in each of 29 populations of known size, by the actual size of those populations. • The two distributions track, except for the expected offset.

The distribution of c • Here is the graph of the distribution of network size:

Reliability vs. validity • Ok, it’s reliable. But if the model works, we ought to be able to use it to estimate the size of populations whose sizes are not known. • Create a maximum likelihood estimate for the size of an unknown subpopulation based on what all respondents tell us and our estimates of their network sizes. • “Roughly speaking, inverting the previous formula.”

Can we predict what we know? • Test this by ‘predicting’ the size of 29 populations of known size. • The overall result is encouraging:

r =.79 … but note the outliers

Over- and under-estimation • The two largest populations are people who have a twin brother or sister and diabetics. • These are highly underestimated. • Without these two outliers, the correlation rises from r = .79 to r = .94 • “No cheating …”

Stigma vs. not newsworthy • Being a twin or a diabetic is neither stigmatizing, nor newsworthy. • From Gene Shelley’s work, we know that personal information about close co-workers or business associates can take a decade or more to be transmitted ... and in the case of being a twin or a diabetic, may never be transmitted.

Another encouraging result • Charles Kadushin ran a national survey to estimate the prevalence of crimes in 14 cities, large and small, across the U.S. • He asked 17,000 people to report the number of people they knew who had been victims of six kinds of crime and the number of people they knew who used heroin regularly.

Here are the estimates for the number of heroin users in each of the 14 cities, along with the estimates from the UCR.

The fact that we track well with official estimates means only that we have a much, much less expensive way to get at these estimates – not that the estimates are correct. • And estimates of other crimes in those 14 cities did not track so well.

Reliability, validity, and accuracy • So, while definitely reliable and perhaps valid, our estimate of network size (and its distribution) is not sufficiently accurate.

Compromising assumptions • 1. Transmission effects: Everyone knows everything about everyone they know. • 2. Barrier effects: Everyone in the population has an equal chance of knowing someone in any subpopulation.

Correlation between the mean number of Native Americans known and the percent of the state population that is Native American is 0.58, p = 0.0001.

Network social barriers • Race (Blacks may know more diabetics than Whites do.) • Gender (men may know more gun dealers than women do.) • Even first names are associated with the barrier effect. • We address the barrier effect by using a random, nationally representative sample of respondents. • However, using the method on specific populations may still lead to incorrect estimates.

The transmission effect • We asked people things about people they knew … and then called up those people to see how much people really do know about their network members.

Fundamental Building Blocks of Social Structure

Fundamental Building Blocks of Social Structure

Presentation Transcript

Building Blocks of Life

Building Blocks of Writing

Building Blocks of Research

Building Blocks of Life

Building Blocks of Compassion

Building Blocks of Social Structure

Building Blocks of Thought

Building Blocks of LOVE

Building Blocks of Geometry

Building Blocks of UML

Building Blocks of Negotiation

Building Blocks of Automation

Building Blocks

Building Blocks

Fundamental Building Blocks: Chemistry, Water, and pH

BUILDING BLOCKS OF ORGANIZING

Building Blocks of Automation

Building Blocks of Thought

Sociology Ch. 4 S. 1: Building Blocks of Social Structure

Building Blocks of Negotiation