1 / 24

Examining the use of administrative data for annual business statistics

Examining the use of administrative data for annual business statistics. Joanna Woods, Ria Sanderson, Tracy Jones, Daniel Lewis. Overview. Background Motivation Admin data Variables of interest Methods tested Discontinuing the survey Cut-off sampling Results Conclusions. Motivation.

Télécharger la présentation

Examining the use of administrative data for annual business statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Examining the use of administrative data for annual business statistics Joanna Woods, Ria Sanderson, Tracy Jones, Daniel Lewis

  2. Overview • Background • Motivation • Admin data • Variables of interest • Methods tested • Discontinuing the survey • Cut-off sampling • Results • Conclusions

  3. Motivation • Drive to increase the use of admin data for business statistics • - reduce survey costs • - decrease burden on survey respondents • One possibility - replace survey data with admin data - Some variables have admin data directly available - Other variables do not have a direct source of admin data available

  4. Annual Business Survey • The Annual Business Survey (ABS) collects financial variables • Target population = UK economy • Stratified simple random sample by industry, region & employment • Samples approximately 60,000 businesses • Businesses with employment > 249 are completely enumerated • Ratio estimation

  5. Available administrative data • Two main sources available: • - VAT turnover data • - Company accounts data (balance sheet variables) • These overlap with, but do not fully cover, the target population • Properties of these data sources are different

  6. Survey population and admin data Survey population

  7. Survey population and admin data Administrative data Survey population

  8. Survey population and admin data Administrative data Survey population MATCHED PART

  9. Administrative data sources

  10. ABS variables • ABS variables which do not have admin data directly available include • Total Acquisitions – investment in land, existing buildings, and computers • Total Disposals – sale of land and existing buildings • Proportion of zeros varies within each sizeband • Total Acquisitions: 71% for 0-9 emp • 9% for >250 emp • Total Disposals: 93% for 0-9 emp • 43% for > 250 emp

  11. Acquisitions & Disposals

  12. Methods Tested • Aim: to see if admin data sources can be helpful as auxiliary variables in estimating these totals to reduce the sample size. • Discontinuing the survey • Predict values for investment variables based on models derived from past survey data. • Cut-off sampling • Stop sampling some businesses • Use admin data to estimate for these units • Consider simple ratio adjustment

  13. Methods Tested: Considerations

  14. Methods tested: Discontinuing the survey • Produce models using past survey & admin data to produce estimates • Linear model – predict values for positive returns • Logistic model – predict probability of positive return • Build a model using data from last survey • Model covariates can be admin data variables • Apply model to future years & evaluate results.

  15. Methods tested: Discontinuing the survey - Linear model • Aim - predict values for acquisitions/disposals • Have skewed data, use log transformation • Use positive returns from year t to create a model • Apply model to year t+1, t+2 ... to get predicted value for each business • Back transform prediction to get back to original linear scale

  16. Methods tested: Discontinuing the survey - Logistic model • Aim – predict probability of company returning a positive value • Use all returned data from year t to model the probability of a business returning a positive value • Apply model to predicted values in year t+1 • Multiply linear model prediction & logistic model probability to produce predicted value for every unit

  17. Results: Discontinuing the survey • Acquisitions • Best linear model for predicting log(total acquisitions) – Intercept, – Standard Industrial Classification(SIC) at three digit level, – Region, – Employment band, – log turnover, – log turnover *SIC section • R-squared = 0.66

  18. Results: Discontinuing the survey • Acquisitions • Best logistic model for predicting probability of a positive return – Intercept, – SIC division level, – Region, – Employment band, – log turnover, • Produced one of the lowest AIC

  19. Results: Discontinuing the survey

  20. Methods tested: Cut-off sampling • Reduces burden but introduces bias • Create a cut-off, based on employment • Stop sampling below the cut-off • Use sample information above the cut-off to estimate for units below the cut-off in an effort to reduce bias • Missing data and match rates are the main difficulty => can’t be applied to full survey population, still need a sample

  21. Simple ratio adjustment • Estimate for units below the cut-off: Total of auxiliary variable below cut-off Estimate of variable of interest above cut-off Estimate of auxiliary variable above cut-off

  22. Results: Simple ratio adjustment

  23. Conclusions • Discontinuing survey - not an option for this variable • Under predicts • Growth rates differ • Cut-off sampling with simple ratio adjustment -can give reasonable results in some divisions but not all - sample size savings can be made where method works well but is dependent on match rate - multiple auxiliary variables are required

  24. Any questions? Joanna.Woods@ons.gsi.gov.uk

More Related