1 / 29

Generate country-scale networks of interaction from scattered statistics

Generate country-scale networks of interaction from scattered statistics. Samuel Thiriot Computer Science Laboratory – University Paris 6 Orange Labs – France Télécom R&D Jean-Daniel Kant Computer Science Laboratory – University Paris 6. Social networks for agent-based modeling.

cbohman
Télécharger la présentation

Generate country-scale networks of interaction from scattered statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generate country-scale networks of interaction from scattered statistics Samuel Thiriot Computer Science Laboratory – University Paris 6 Orange Labs – France Télécom R&D Jean-Daniel KantComputer Science Laboratory – University Paris 6

  2. Social networks for agent-based modeling • "social networks" in agent-based models are rather interaction networks; they define who interacts with who within the population • when the population is small, the network may be collected from the field. However, data collecting becomes intractable at a population scale. In that case it is common to use a generator to produce the network • a network generator is an algorithm which, given parameters, generates networks constrained by properties observed in real networks • a lot of agent-based models were shown to be highly sensitive to this structure (opinion dynamics, diffusion of innovations…) • => the descriptive power of the structure determines the relevance of simulation results

  3. Case study: interactions in rural Kenya • example: model diffusion of contraceptive use in rural Kenya from field studies [Watkins et al. 2005] : • women discuss mainly contraceptive solutions with other women (sisters-in-law, co-wives). Discussions often take place during quotidian activities: when they retrieve water, walk together to the market, or in the beginning of office. They rarely discuss the problem with their husband, but speak more often with their brothers-in-law • stronger normative influence come from mothers and husband • => the structure of family is determinant, as are affiliations (workplace, quotidian activities) • how to generate a plausible network of interactions for rural Kenya, compliant with these observations ? • more generally, how may a social scientist constraint networks using field observations ?

  4. Requirements • R1: generate models of large populations • a lot of models aim to reproduce dynamics in large populations (e.g. opinion dynamics, consumer behavior, diffusion of innovations [Thiriot & Kant, 2005]) • R2: represent different kinds of relationships linking two agents • the nature of the relationship changes the influence between these agents (find a work [Granovetter 1973], conversations about products [Carl 2006], recommendations inside families [Engel et al. 1996]) • R3: detail attributes of agents in the network of interactions • attributes change the frequency and nature of interaction (e.g., the content of word-of-mouth changes with distance [Carl 2006]) • attributes often condition the creation of relationships (homophily principle, as shown later) • a lot of individual processes require attributes. (e.g. adoption of contraceptive use depends on woman's age and number of children) • R4: a relevant network generator should comply with processes of social selection

  5. Evidence on social networks • social selection processes [Wasserman & Faust, 1994] • homophily: individual exhibit a strong tendency to create relationships with people who share similar characteristics [McPherson et al. 2001] • affiliation: two individuals sharing a common affiliation (project, workplace, event) have more chances to bond and interact frequently • transitivity: two individuals have more chances to bond if they share a common friend ("friends of my friends are also my friends") • large-scale statistics • low density: few links compared to the number of nodes • high clustering: a lot of groups strongly interconnected • short average path length (Milgram's experiment) • power-law distribution of degrees: several individuals have a high degree of connectivity, while most have a lower degree

  6. Existing models compliant with (part of) evidence • random graphs with attributes • L(a1,a2)=1 is the random variable representing link existence between a1 and a2 • probability of link depends on agents' attributes Att(a) • p( L(a1,a2)=1 | Att(a1),Att(a2) ) • Markov random graphs [Frank & Strauss, 1986] • also comply with transitivity: two links may be dependant if they have a node in common • p( L(a1,a2)=1 | L(a1,a3)=1, L(a3,a2)=1) • recently unified with random graphs with attributes [Robins et al. 2001] • Small-world graphs or scale-free networks generate graphs with high clustering rate, short paths and power-law distribution of degree • Agent-based models were proposed to build networks, but their purpose is to test hypothesis, and don't aim to be descriptive (to date) • => No one of these models satisfy our requirements

  7. Evidence: scattered statistics • plenty of knowledge is available on population as national census, sociological studies and other field studies • who individual are : gender, ethnicity, socioeconomic classes, incomes, family structure (number of children, marital status, etc…) • affiliations (employment, participation in associative life, sport, etc.) more detailed level than network statistics • in the case of Kenya: • sociological studies on the structure of family [Mburugu & Adams 2004] • Kenya demographic and Health Survey • specific studies on the modeled phenomenon [Watkins et al. 1995a,b] [Rutenberg & Watkins 1997] • these "scattered statistics" are already collected and published at a country scale. They constitute large-scale, detailed knowledge on the structure of society structure • surprisingly, no network generator relies on this information to constraint networks • R5 : take into account scattered statistics during generation

  8. Approach (2) (1) • a methodology to piece together scattered statistics (R5) in the form of Bayesian networks • an algorithm to generate the network given these parameters, based on known social selection processes (R4) • a method to measure the compliance of the generated network with parameters • the resulting network of interaction describes relationships at a country scale (R1) will include different kinds of relationships (R2), and agents' attributes (R3) (3) (1) (2) (3)

  9. Methodology to codify evidence

  10. Step 1&2: define relationships & links • step 1: define the types of social links T that should be represented in the relationships network • identify links leading to different interactions in the model, or created by different processes • in Kenya, we know that contraceptive adoption by women depends on social interactions with parents, siblings, husband, brothers of husband, colleagues • as usual in social network analysis, we distinguish: • links created given agents' attributes TAtt • for our illustration, TAtt={spouses, motherOf, colleagues, friends} • links created by transitivity TTrans • here Ttrans = {fatherOf, siblings} • step 2 : define attributes • select agents' attributes supposed to influence the probability of a link to be created or useful for individual behavior (given data availability) • in our application, we retain marital status, age, gender, work (which determine colleague links) and spatial location

  11. Step 3: represent attributes with BN • characteristics of individuals are often interdependent: • number of children given woman's age • employment given educational level • age given spatial location • type of work given gender • one can consider these attributes as random variables • Bayesian Networks enable intuitive representation of attributes dependencies number of links to createfor each kind of link t∈TAtt

  12. Step 4: represent links with BN • probability to create a link of type t ∈ TAtt given agents' attributes may also be represented using a Bayesian network, named here matching BN • example: matching BN for link type "spouses" attributes of agent 1 link as spouses (yes/no) conditions on linking attributes of agent 2

  13. Step 4: represent links with BN • Bayesian networks facilitate constraints on matching: • equality: spouses must live in the same location • Boolean operators: two agents can be linked as spouses if they live in the same location and their age is compliant and they are of different gender […] • mathematical difference : wives are on average ten years younger than their husbands • qualitative knowledge, by using approximated probabilities (e.g.: most colleagues live in the same town) • one matching BN per type in TAtt

  14. Generation of networks

  15. Generation of the population • All variables in the agent BN represent agents attributes • the Bayesian networks defines in which order to process variables • algorithm: • for each individual to create • for each attribute • use Monte Carlo sampling to choose randomly the value of attribute • at the end of this process, all agents in the population have their attributes fully determined

  16. creation of links TAtt • the matching BN describes the probability for two agents took randomly in the population to be tied together attributes of agent 1 attributes of agent 2

  17. creation of links TAtt • we "force" evidence for the creation of link p(link_spouses=yes)=1, and update probabilities of agents' attributes • the BN now describes two population subsets of candidates agents for linking C1,t C2,t

  18. creation of links TAtt • for each kind of link t in TAtt • for each agent a1 in the set of candidates C1,t • use a1 attributes values as evidence in the matching BN • put evidence for linking p(create_link=yes)=1 • thus attributes of a2 in the matching BN describes characteristics of potential candidates for linking with link t given agent a1 attributes • search for probable candidates in C2,t : same process than generation with Monte carlo sampling (see paper) • creation of links by transitivity is then trivial

  19. creation of links TTrans • parameters for links created by transitivity Ttrans are in the form: • p( L(a1,a2,t1)=1 | L(a1,a3,t2)=1, L(a3,a2,t3)=1) with t1,t2,t3 relationships networks • example: one creates "fatherOf" links from "motherOf" and "spouses" links p( L(a1,a2,fatherOf)=1 | L(a1,a3,motherOf)=1, L(a3,a2,spouses)=1) • creation of these transitive links is simple: • for each pair of agents (a1,a3) linked with relationship of type t2 • for each pair of agents (a3,a2) linked with relationship t3 • create a link of type t1 between a1 and a2 with probability p( L(a1,a2,fatherOf)=1 | L(a1,a3,motherOf)=1, L(a3,a2,spouses)=1)

  20. Generated network

  21. resulting graph • the resulting graph includes links of different kinds T, and includes attributes' values for each agent • each agent is positioned in its social environment (family, colleagues, friends) • this structure is replicated at a large scale (here, 10,000 agents)

  22. measure errors : biais in statistical distribution • to check the compliance of the generated population with the parameters, we learn BNs from the generated population, and quantify the difference between theoretical and measured probabilities (average difference) • bias in statistical distribution • while BN describe a theoretical population with continuous probabilities, we generate a discrete population and link agents only when a suitable candidate is found. • a minimum population size is required to reach statistical representativity • depends on the complexity of the parameter BNs, and of the population size • => here, given our BNs, the minimal population required to generate a representative population is 10,000

  23. measure errors : errors in parameters • errors in parameters • stable error rate, independent of the population size, highlight parameters discrepancy • illustrated in our example by the "spouses" link • => we can detect major problems in parameters • here, the discrepancy is: • number of men X number of wives per men > number of women having marital status = "married" • as a consequence, the theoretical probabilities to link men with women with link "spouses" cannot be satisfied • once detected, the problem is easily corrected

  24. statistical properties of the network • statistical properties of the generated network • average path length increases very slowly with the population size, exhibiting a small-world effect. The average path length is about 5 • average degree , density and transitivity indicators are stable above the minimum population size • all these values are compliant with statistics of real networks

  25. usage for social simulation • we generated a network of relationships ("who knows who") from knowledge about the social system • for agent-based simulation, we need a network of interactions, that is "who interacts with who" • eliminate relationships which don't lead to interaction • ex: mother don't discuss contraceptive solution with their young children • if necessary, tag networks with probabilities • in the case of Kenya, we know that long-distance relationships don't lead to communication (except for mother-daughter normative influence)

  26. Summary • methodology: • the social scientist describe attributes' interdependencies and links conditions using graphical models • generation of the population and network are automated • result: • a population of heterogeneous agents with attributes values compliant with complex interdependencies • a network of relationships constrained by properties of the target population • kinds of relationships explicitly represented in the network, enabling modelers to describe more precisely probabilities to interact across links • a large population compliant with family structure, affiliations, etc. • indicators to check coherency of parameters, and determine the minimum population required

  27. Discussion • George Box: "All models are wrong, but some models are more useful than others" • This model (by definition) is wrong: • some phenomena cannot be represented easily • representation with Bayesian Networks is more restricting than ad-hoc graph generation • However, it aims to be useful: • Bayesian Networks provide an intuitive tool to piece together scattered statistics and qualitative knowledge • generation of graphs is simplified for social scientists, and don't requires development time nor expert knowledge in programming • network of interactions is not the real one, but is constrained given available data (best effort)

  28. Future work • formal analysis of statistical properties of generated networks given input BN • improve generation efficiency • sensitivity analysis, in order to evaluate the risk when translating qualitative knowledge to probabilities • application to larger and more complex populations • the case of Kenya was chosen as a "relatively simple" case (few ethnics, few socio-economical differences [Watkins 1995]) • applying such a method to a larger & more complex population, like "industrialized" countries, will require a very large number of parameters • such a generation will enable more realistic simulations of social dynamics, including innovation diffusion

  29. Thanks for your attention ! feel free to contact us for any question, remark or criticism : samuel.thiriot@lip6.fr

More Related