Download
constructing individual level population data for social simulation models n.
Skip this Video
Loading SlideShow in 5 Seconds..
Constructing Individual Level Population Data for Social Simulation Models PowerPoint Presentation
Download Presentation
Constructing Individual Level Population Data for Social Simulation Models

Constructing Individual Level Population Data for Social Simulation Models

81 Vues Download Presentation
Télécharger la présentation

Constructing Individual Level Population Data for Social Simulation Models

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Constructing Individual Level Population Data for Social Simulation Models Andy Turner http://www.geog.leeds.ac.uk/people/a.turner/ Presentation as part of the Social Simulation Tutorial at the International Symposium on Grid Computing in Taipei, Taiwan 2010-03-07

  2. Outline • Introduction • Contemporary population data • Developing Population Data Expertise • Confidentiality and Disclosure Control • Population Reconstruction is an Art • 3 Population Data Integrations • Integrating survey data • Other Data for Social Simulation • What next?

  3. Introduction • Individual Agents representing people can be generated for a region • Using entirely made up data • Based on existing aggregate data measured by a census or survey • Agents attributes may be enriched using data from other sources

  4. Agents can be collected into groups sharing common characteristics • Agents can be geographically located in sub-regions and initialised with various attributes • As a Social Simulation Model (SSM) is run Agents may become more complex as they interact and more detailed in terms of their history

  5. Agents output from a SSM can be input into another SSM • This can be viewed as an enrichment process • As simulation proceeds, then it can be hoped that a model becomes more realistic and representative of a population • The “Garden of Eden” configuration we started with today is very unrealistic

  6. After several generations it settles down into something more normal and which changes gradually • Initially, the Age distribution of the population is quite odd and no females are pregnant, but after a number of generations things balance out • Without doing anything and allowing randomness to even things out, over time the distribution of birthdays should even up • However, this could take a very long time, a very large number of iterations if fertility was high and miscarriage was not modelled and gestation was of a fixed duration

  7. Contemporary population data • Most countries provide some form of aggregate statistics about population to the research community • In many cases this is publicly available • It tends to be derived from census surveys and/or registration data • Most countries have a system for registering births, deaths and marriages. • Some also have mandatory systems for recording peoples changes of residential address

  8. Data in a very disaggregate or individual level is available for some countries • In most cases this is a sample of records • These are sometimes annonymised in that identifying variables such as a persons name and sometimes also their residential location is removed • In some cases this data is removed, but replaced with a unique identifier that is otherwise meaningless, but can be used to link back to other data • Pseudo annonymisation • In addition in many countries there are large and small scale social surveys • Also there are very detailed lifestyle data collected by business that is observing customers and also directly surveying the population • Population data is very useful and very valuable if it is good!

  9. Additionally, data is collected by public service authorities • Health • Education • Utilities • All these data can be integrated and used to create and enhance individual level population data • The process of creating these data is sometimes referred to as population reconstruction

  10. Probably all countries have a unique set of available population data • The people represented are different, so the data captured about them is often different • Common attributes when they are the same captured can be done so different and can be measured or stored in different ways • E.g Age versus DateOfBirth

  11. Developing Population Data Expertise • Most countries have population data experts • Becoming one of these and getting to grips with the data is a considerable effort • It is key to learn the details of what is available and what are the restrictions on its use • This is getting easier as metadata improves

  12. There are generally useful ways to combine and enhance population data whilst preserving confidentiality • How best to do this all depends on what variables there are and how these are detailed

  13. Confidentiality and Disclosure Control • Confidentiality is a big issue in many countries • It is such a big issues in some countries that people have voted to get rid of their data! • There is no population census in Germany or in the Netherlands

  14. Disclosure control • E.g. Annonymisation by removal of names and addresses • Helps to keep some people happy that the data exists • Security is a big issue with population data • We need to be trusted with the data if we are to put it to good use • People worry rightly as the data can also be put to bad use

  15. Population Reconstruction is an Art • Because the types of available population data can be so different • There is little point in detailing specific data integrations now • However it can be argued that most population data from whatever type of survey can be integrated • Whether it is useful to do so depends on many things

  16. 3 Population Data Integrations • Using a representative sample survey and integrating this with aggregate data from a comprehensive census to produce individual level census data estimates • Linking survey data with additional variables to individual level census data • Linking two different sets of survey data using common variables to form an integrated survey with a set of desired variables

  17. Integrating survey data • Sometimes the term data fusion is used • Most survey data is bias, but some attempts to be or can be reduced to be generally representative in the proportions of each time of person reflect that of the overall population

  18. Representative survey data can be linked using probabilities and random assignements of characteristics • Known to be biased surveys require data fusion to be used and the hope that most types of people are represented in the sample survey as exist in the total population

  19. Usually it is partial survey data that has interesting extra variable that might be of interest for a simulation or for comparing with another variable • It might be that after fusing the data it is the first time that two variables can be tested for correlation

  20. Other Data for Social Simulation • As SSM are part of this we should consider other data used to drive the models • Especially the probabilities for the major processes being modelled • Mortality • Death • Fertility • Birth

  21. What next? • 12:30  Lunch • 14:00  Infrastructures for Social Simulation (Rob Procter) • 14:30  Introduction to Grids and Cloud Computing • 15:00  Coffee break