1 / 27

External Data Sources

External Data Sources. 2008 CAS Ratemaking Seminar March 17-18, 2008 John Stenmark Consulting Actuary Actuarial Data Management Services. External Data Sources Overview. Why use (Zip Code based) Demographic Data Zip Code vs. Zip Code Tabulation Area Issues

robinbrown
Télécharger la présentation

External Data Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. External Data Sources 2008 CAS Ratemaking Seminar March 17-18, 2008 John Stenmark Consulting Actuary Actuarial Data Management Services

  2. External Data SourcesOverview • Why use (Zip Code based) Demographic Data • Zip Code vs. Zip Code Tabulation Area Issues • Some Possible Methodologies to Address those Issues • Census.gov Data Guide • Cartographic Boundary File Guide

  3. External Data SourcesWhy use (Zip Code based) Demographic Data • Predictive Modeling allows/encourages the use of data outside of the rating variables and, in fact, outside of the company. • The first external data that a company is likely to use is Demographic Data, usually by Zip Code. • Predictive Modeling has two phases: the Modeling itself (usually frequency and severity based) and derivation of rates and relativities using the modeled data (frequency and severity combined into modeled pure premium) • Demographic data is used in the Modeling Phase • Especially useful in a multi-state database

  4. External Data SourcesZip Code vs. Zip Code Tabulation Area IssuesThe Problem • Postal Zip Codes are not defined by Boundaries, but by postal routes. • ZCTAs (as used in this presentation) were defined during the 2000 census as boundaries by the U.S. Census Bureau based, at least partially, on the U. S. Postal Zip Codes at that time and have not been changed. • The Postal Zip Codes in your data do change quite often. • Over time your insured postal zip codes (and the territory boundaries defined from those codes) will increasingly diverge from the ZCTAs. • Therefore the Zip Code for a particular policy or claim may not have demographic data associated with it.

  5. External Data SourcesZip Code vs. Zip Code Tabulation Area IssuesThe Solution(s) • Assign Derived Demographic Data elements by County (filling entire database) • Then assign data elements by ZCTA where there is a match • Disadvantage: Slight inaccuracy problem where Postal Zip is in a different geographic area from ZCTA • Disadvantage: Precision is inconsistent (county demographics for some insureds and Zip for others) • Advantage: Easy to apply

  6. External Data SourcesZip Code vs. Zip Code Tabulation Area IssuesThe Solution(s) • Geocode company data • Assign each policy/claim to ZCTA using U. S. Census Bureau boundary files. • Assign ZCTA demographic data to policy/claim • Disadvantage: More complex and time consuming (resource intensive) • Advantage: Far more accurate

  7. Average Education Years • Population Density • Mean Age • Percent Rural • Percent Farm • Travel Time • Median Income • Average Education Years • Population Density • Mean Age • Percent Rural • Percent Farm • Travel Time • Median Income • Median year Owner occupied structure built • Median year householder moved into unit • Median value for all owner-occupied housing units • Median price asked • Median selected monthly owner costs External Data SourcesCensus Data • There are numerous sources of demographic data but… • The source for most of these data is the US Census Bureau at http://www.census.gov/ . • Many variables can be derived from this data. • Some possible variables appear below:

  8. External Data SourcesCensus Data

  9. External Data SourcesCensus Data

  10. External Data SourcesCensus Data

  11. External Data SourcesCensus Data

  12. External Data SourcesCensus Data

  13. External Data SourcesCensus Data So how do you use the data from these Zip Files? Very Cryptic Text Files An Access database is available (referenced in the Readme document.) The text below is from: ftp://ftp2.census.gov/census_2000/datasets/Summary_File_3/Arkansas/0README_SF3.doc

  14. External Data SourcesCensus Data To download SF3.mdb Click here:

  15. External Data SourcesCensus Data After Downloading SF3.mdb open the Access database Seventy six tables corresponding to the seventy-six zipped ftp files In addition a SF3GEO Table, SF3GEO Dictionary Table and a Tables Table These define the structure of the database

  16. External Data SourcesCensus Data • Data is imported into the tables using the File – Get External Data – Import Command. • You will need to change the file extensions from .uf3 to .txt for this to work. • The geo files are fixed width the others are comma delimited • The database has specs for each table and these can (should) be accessed using the Advanced button on the import wizard.

  17. External Data SourcesCensus Data • Use the “Tables” Table to select the columns that you want, then determine which files you need to import. • Remember that the tables contain all geographic areas: State, County, Zip, Block, County/Zip, etc. • You will need to work with one of those at a time. • Summing the entire file will scale up your results somewhat

  18. External Data SourcesCensus Data The Tables Table The TEXT column provides the description of each data element The TABLE column provides the Table name (remember the data must be loaded into each needed table) The FIELDNUM column provides the Column Name that will contain the data element. In this case to get one stat (e.g. Average Education) a weighted average is required So to get Average Education the table (P037) tells us we must load Table SF30003 from file named la00003 and Select P037001 thru P037035

  19. External Data SourcesCensus Data The SF3GEO Table Indexed on LOGRECNO Column The NAME column provides the Description for each row The ZCTA5 Column provides the five digit Zip for the row Notice that there are partial Zips (split between Parishes/Counties)

  20. External Data SourcesCensus Data • By joining SF3GEO and the selected table on LOGRECNO the demographic data is by columns and the geographic data is by rows. • Note: Make sure the query selects only the geographic data desired. I. e. give it the smell test • The following query: SELECT SF3GEO.ZCTA5, SF3GEO.AREALAND, SF3GEO.NAME, SF30003.P037001 FROM SF3GEO INNER JOIN SF30003 ON SF3GEO.LOGRECNO = SF30003.LOGRECNO WHERE (((SF3GEO.COUNTY) Is Null) AND ((SF3GEO.ZCTA5) Not Like "###XX" And (SF3GEO.ZCTA5) Not Like "###HH")) ORDER BY SF3GEO.ZCTA5; Yields:

  21. External Data SourcesCartographic Boundary Files • So how do you use the data from these Zip Files? • Very Cryptic Text Files • An Access database is available (referenced in the Readme document). • The text below is from: ftp://ftp2.census.gov/census_2000/datasets/Summary_File_3/Arkansas/0README_SF3.doc (but any of the 0readme.doc will do)

  22. External Data SourcesCartographic Boundary Files • In addition to demographic data the Census Bureau publishes boundary files for each of its boundaries • Remember - since Postal Zip Codes and Postal Zip Code definitions change over time and the Census Bureau redefined ZCTAs somewhat for the census there will be a mismatch between the boundary files and the Zip Codes in your experience database • First go to: http://www.census.gov/geo/www/cob/bdy_files.html

  23. External Data SourcesCartographic Boundary Files From there: • For 5-Digit ZIP Code Tabulation Areas (ZCTAs) go to: http://www.census.gov/geo/www/cob/z52000.html • For County and County Equivalent Areas go to:http://www.census.gov/geo/www/cob/co2000.html

  24. External Data SourcesCartographic Boundary Files • Three types of files on each page. For ZCTAs they are: • Census 2000 5-Digit ZIP Code Tabulation Areas (ZCTAs) in ARC/INFO Export (.e00) format • Census 2000 5-Digit ZIP Code Tabulation Areas (ZCTAs) in ArcView Shapefile (.shp) format • Census 2000 5-Digit ZIP Code Tabulation Areas (ZCTAs) in ARC/INFO Ungenerate (ASCII) format • Most mapping software will read the Shapefile format.

  25. External Data Sources 2008 CAS Ratemaking Seminar March 17-18, 2008 John Stenmark Consulting Actuary Actuarial Data Management Services (601) 955-3022jstenmark@comcast.net

  26. External Data Sources 2008 CAS Ratemaking Seminar March 17-18, 2008 John Stenmark Consulting Actuary Actuarial Data Management Services (601) 955-3022 jstenmark@comcast.net

More Related