1 / 54

IPUMS: How we make it, how you can get it, and how you can use it

IPUMS: How we make it, how you can get it, and how you can use it. Trent Alexander. Minnesota Population Center University of Minnesota. Introduction to the IPUMS Project. 1. What is the IPUMS. 1. What is the IPUMS?. 2. Data entry and coding. 3. Harmonization.

dinh
Télécharger la présentation

IPUMS: How we make it, how you can get it, and how you can use it

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IPUMS: How we make it, how you can get it, and how you can use it Trent Alexander Minnesota Population Center University of Minnesota

  2. Introduction to the IPUMS Project 1. What is the IPUMS 1. What is the IPUMS? 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination

  3. Datasets in IPUMS-USA Census Sample Number of persons in dataset Year Density 1850 1.0 198,000 1860 1.0 354,000 1870 1.0 428,000 1880 1.0 503,000 1900 1.0 846,000 1910 1.4 1,503,000 1920 1.0 1,050,000 1930 0.5 606,000 1940 1.0 1,351,000 1950 1.0 1,922,000 1960 1.0 1,800,000 1970 6.0 12,180,000 1980 9.0 20,403,000 1990 6.0 15,000,000 2000 6.0 16,884,000 2001-2005 5,700,000 0.4-1.0

  4. Datasets in IPUMS-USA Planned 2007-2010 Census Sample Number of persons in dataset Year Density 1850 10.0 1,980,000 1860 1.0 354,000 1870 1.0 428,000 1880 10.0 5,030,000 1900 6.0 4,230,000 1910 1.4 1,503,000 1920 1.0 1,050,000 1930 5.0 6,060,000 1940 1.0 1,351,000 1950 1.0 1,922,000 1960 6.0 10,800,000 1970 6.0 12,180,000 1980 9.0 20,403,000 1990 6.0 15,000,000 2000 6.0 16,884,000 2001-2005 5,700,000 0.4-1.0 2006- ?? 1.0/year

  5. Status of Countries in IPUMS-International Currently in IPUMS-Intl Argentina Belarus Brazil Cambodia Chile China Costa Rica Ecuador France Greece Hungary Israel Kenya Mexico Palestinian Territories Philippines Portugal Romania Rwanda South Africa Spain Uganda United States Venezuela Vietnam

  6. Status of Countries in IPUMS-International Currently in IPUMS-Intl Data Received or Agreement Signed Latin America Europe Asia, Africa, Other Argentina Belarus Brazil Cambodia Chile China Costa Rica Ecuador France Greece Hungary Israel Kenya Mexico Palestinian Territories Philippines Portugal Romania Rwanda South Africa Spain Uganda United States Venezuela Vietnam Armenia Bolivia Austria Canada El Salvador Bulgaria Egypt Dominican Republic Czech Republic Fiji Guatemala Germany Indonesia Honduras Ireland Iraq Nicaragua Netherlands Malaysia Panama Slovenia Mongolia Paraguay United Kingdom Pakistan Peru Tajikistan Uruguay Turkmenistan Current funding for 44 countries by 2009 Next data release late Winter 2007

  7. Datasets in IPUMS-CPS Year Household Person Year Household Person 1962 31,106 71,741 1984 73,632 161,167 1963 24,649 55,882 1985 74,568 161,362 1964 23,438 54,543 1986 74,145 157,661 1965 23,600 54,502 1987 73,843 155,468 1966 48,095 110,055 1988 74,806 155,980 1967 28,924 68,676 1989 70,454 144,687 1968 46,069 150,913 1990 75,269 158,079 1969 47,028 151,848 1991 75,076 158,477 1970 44,982 145,023 1992 74,236 155,796 1971 45,952 146,822 1993 73,878 155,197 1972 44,906 140,432 1994 73,126 150,943 1973 44,467 136,221 1995 72,152 149,642 1974 44,427 133,282 1996 63,339 130,476 1975 43,714 130,124 1997 64,046 131,854 1976 46,368 135,351 1998 64,659 131,617 1977 68,291 160,799 1999 65,377 132,324 1978 67,900 155,706 2000 64,944 133,710 1979 68,375 154,452 2001 64,362 128,821 1980 80,468 181,488 2002 98,848 217,219 1981 81,451 181,358 2003 99,986 216,424 2004 98,979 213,241 1982 73,368 162,703 1983 73,195 162,635 2005 98,664 210,648 2006 98,069 209,542

  8. Data in North Atlantic Population Project (NAPP)

  9. NAPP Plans, 2007-2011link data across borders and time

  10. What Are Microdata? Individual-level data • every record represents a separate person • all of their individual characteristics are recorded • users must manipulate the data themselves Different from aggregate/summary/tabular data •a disability table from www.factfinder.census.gov • an occupation table from a published census volume from the library

  11. 1930 Census Population Schedule

  12. Raw Census Microdata from IPUMS

  13. Age Birthplace Mother’s birthplace Sex Relationship Race Occupation IPUMS Data Structure Household record (shaded) followed by a person record for each member of the household For each type of record, columns correspond to specific variables

  14. The Advantages of Microdata  Combination of all of a person’s characteristics  Characteristics of everyone with whom a person lived  Freedom to make any table you need  Freedom to make models examining multivariate relationships

  15. Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination

  16. John C. Breckinridge of Kentucky How a case gets from the manuscript census into the IPUMS An example from the 1860 census.... Vice President of the U.S., 1856-1860 Secretary of War, C.S.A, 1861-1865 Later charged with treason, fled to Cuba

  17. Original enumeration form from the 1860 U.S. Census

  18. Data entry screen in Minnesota (ca. 1997)

  19. Household and person record ready for checking (ca. 1999)

  20. Coding dictionary for the occupation variable (ca. 2000)

  21. Checked and coded data, ready for harmonization (ca. 2001) Wealth Occupation Page Year Age Relationship Industry

  22. Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination

  23. Translation Matrix – Marital Status How we integrate variables across time (and countries)

  24. Translation Matrix – Marital Status location of data in the original samples

  25. Translation Matrix – Marital Status location of data in the 1960 U.S. Census Bureau file

  26. Translation Matrix – Marital Status different original codes for “widowed” across the censuses

  27. Translation Matrix – Marital Status final IPUMS coding scheme for marital status

  28. Variable Description: Farm Status(IPUMS-USA)

  29. Codes: Farm Status (IPUMS-USA)

  30. Variable Description: Literacy(IPUMS-Intl)

  31. Variable Description: Union(IPUMS-CPS)

  32. Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination

  33. Additional Improvements to the U.S. PUMS •  Additional documentation, including all • enumeration forms and instructions • Consistent occupation/industry classifications •  Consistent metropolitan classifications •  Missing data allocation •  Constructed family variables

  34. IPUMS “Pointer” Variables (Simple household) Spouse’s 2 1 0 0 0 0 Mother’s Father’s 0 0 0 0 0 0 2 1 2 1 2 1

  35. Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination

  36. Extract requests per month, 2002-2007 15,000 users have made 85,000 data extracts

  37. IPUMS Users’ Disciplines • Economics (36%) • Sociology (16%) • Demography (12%) • Other Academic (19%) • Historians: only 3%!!! • Other Non-academic (15%)

  38. IPUMS Users’ Status • Student (46%) • Faculty (23%) • Academic researcher (12%) • Non-academic researcher (16%) • Support staff (3%)

  39. Number of Countries Selected for Research IPUMS-International • 1 country (39%) • 2 countries (24%) • 3 countries (10%) • 4 countries (6%) • 5 countries (3%) • 6-8 countries (17%)

  40. Other IPUMS Data Sources PDQ (www.pdq.com) Fathom (www.keypress.com/fathom)

  41. Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination

  42. Large More cases than any comparable datasets Enable study of relatively small populations • National in scope Results not subject to local peculiarities Provide context for local studies • Long-term Provide historical depth • Microdata Can make your own tabulations Apply multivariate techniques 4 Key Strengths of the Census Microdata Samples

  43. Geographic detail • Samples Confidentiality restrictions Too small to answer some questions (especially ACS/CPS) • Not annual Any historical analysis will have gaps (not if using ACS/CPS!) • Cross-sectional data Not longitudinal (but we’re working on it!) • Need knowledge of a statistical package Limitations of the Microdata Samples

  44. Limitations of the Different IPUMS Data Series • IPUMS-USA Geography 1940-present • IPUMS-International User burden: documentation, information overload • IPUMS-CPS Sample size (60 to 200K)

  45. Introduction to the IPUMS Project 1. What is the IPUMS 2. Data entry and coding 3. Harmonization 4. Additional Data Enhancements 5. Users and Access 6. Strengths and Limitations 7. Dissemination

  46. Lab: Using weights and making tables • Register for extract system • Weights: • why you have to use them • which ones you should use • 3. Exercises

  47. What is a weight? • It’s a variable, just like age, sex, race, etc. • Every case in every sample has a weight value • The main weighting variable in IPUMS is called... • Person weight (variable name is PERWT) • The person weight variable tells you how many people nationwide are represented by any given case • If you forget to use it, your analysis could be wrong!!!

  48. How do weights look in the data?

  49. Sample of pets in my neighborhood Cases in my pet sample dog cat cat cat rabbit rabbit rabbit rabbit 8 cases in sample 50% are rabbits

  50. New estimates that take weights into account Number of pets Cases in my pet sample in my neighborhood that each sample pet represents (PERWT) dog 200 cat 100 cat 100 cat 100 rabbit 25 rabbit 25 rabbit 25 rabbit 25 8 cases in sample 50% are rabbits

More Related