1 / 38

Investigation of Treatment of Influential Values

Investigation of Treatment of Influential Values. Mary H. Mulry Roxanne M. Feldpausch. Outline. Current practices Methods investigated Results Next steps. Influential Observation.

asher
Télécharger la présentation

Investigation of Treatment of Influential Values

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch

  2. Outline • Current practices • Methods investigated • Results • Next steps

  3. Influential Observation An observation is considered influential if its weighted contribution has an excessive effect on the estimate of the total (Chambers et al 2000)

  4. The Data - U.S. Monthly Retail Trade Survey • Collect sales and inventories • Monthly survey of about 12,500 retail business with paid employees • Sample selected every 5 years • Sample is stratified based on industry and sales • Quarterly sample of births • Deaths are removed

  5. The Data • Analysis done at published NAICS level • Hidiroglou-Berthelot algorithm ran on the data before looking for influential values • Horvitz-Thompson estimator

  6. Causes of Influential Units • One time or rare event • Erroneous measure of size • Change in the make-up of the unit • Seasonal Businesses

  7. Current Practices • Analyst review an effect listing of micro level data and investigates units that may be influential • When the analyst determines a correctly reporting unit may be influential, the case is referred to a statistician

  8. Current Practices • One time influential value • Imputation • Recurring influential value • Weight adjustment based on the principles of representativeness • Moving the unit to a different industry when the nature of the business changes

  9. Goals • To improve upon current methodology by making it more objective and rigorous • To find methodology that uses the observation but in a manner that assures its contribution does not have an excessive effect on the total

  10. Assumptions • Influential observations occur infrequently, but are problematic when they appear. • The influential observation is true, although unusual. It is not the result of a reporting or coding error.

  11. Strategy • Identify candidate methodologies and test with real data from one industry (about 700 businesses) for a month that contains an influential value

  12. Evaluation Criteria • Number of influential observations detected, including the number of true and false detections made • Estimate of bias • Impact on month-to-month change

  13. Notation • where • Yi is the sales for the i-th business in a survey sample of size n • wi is the sample weight for the i-th unit • Xi is the previous month’s sales for the ith business

  14. Methods Examined • Weight trimming • Reverse calibration • Winsorization • Generalized M-estimation

  15. Weight Trimming • Does not identify influential units • Adjusts the weight of the observation

  16. Weight Trimming • Truncate the weight of the influential observation • Adjust the weights of the non-influential observations to account for the remainder of the truncated weight • Sum of the new weights is the same as the sum of the original weights • (Potter 1990)

  17. Weight Trimming Notes • Calculations were done within sample stratum. • Choice of correction factor could be investigated. We arbitrarily chose ci=wi/3.

  18. Reverse Calibration • Does not identify influential units • Adjusts the value of the observation

  19. Reverse Calibration • Use a robust estimation method to estimate the total • Modify the influential observations to achieve that total • (Chambers and Ren 2004)

  20. Winsorization • Identifies influential units • Adjusts the value of the observation

  21. Winsorization • Type I • Type II

  22. Winsorization – Defining K • Define a separate Kh for each stratum in a manner than minimizes the mse (Kokic and Bell 1994) • Define a separate Ki for each observation in a manner that minimizes the mse (Clarke 1995)

  23. Winsorization – Defining K • Use unweighted data to define Kh for each stratum where Kh = mh +2sh • Use weighted data to define Kh for each stratum where Kh = mh +2sh where mh and sh are based on the weighted data

  24. Winsorization-Our Implementation • Used a robust regression in SAS to estimate the parameters needed in the calculations

  25. M-estimation • M-estimators are robust estimators that come from a generalization of maximum likelihood estimation

  26. M-estimation • Identifies influential units • Adjusts either the weight or the value of the influential observation

  27. M-estimation • Used a weighted M-estimation technique that is able to modify the weights or the values of the influential observations (Beaumont and Alavi 2004)

  28. Results

  29. Number of Outliers Detected *Method does not detect outliers, one outlier was specified

  30. Replacement Values (in Millions) *Weight trimming adjusts the other 18 weights in the stratum **Winsor wgt +2s identified 3 other values

  31. Total Sales for the Industry

  32. Chosen for Further Study • Winsorization by each observation • M-estimation by observation • M-estimation by weight

  33. Contact Information Mary.H.Mulry@census.gov Roxanne.Feldpausch@census.gov

More Related