1 / 25

Scott Hollenbeck – Scott.M.Hollenbeck@irs Barry Johnson – Barry.W.Johnson@irs

Improving the Quality of Tax Statistics: Recent Innovations in Editing and Imputation Techniques at the Statistics of Income Division of the U.S. Internal Revenue Service. Scott Hollenbeck – Scott.M.Hollenbeck@irs.gov Barry Johnson – Barry.W.Johnson@irs.gov

Télécharger la présentation

Scott Hollenbeck – Scott.M.Hollenbeck@irs Barry Johnson – Barry.W.Johnson@irs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving the Quality of Tax Statistics:Recent Innovations in Editing and Imputation Techniques at the Statistics of Income Division of the U.S. Internal Revenue Service Scott Hollenbeck – Scott.M.Hollenbeck@irs.gov Barry Johnson – Barry.W.Johnson@irs.gov Melissa Ludlum – Melissa.R.Ludlum@irs.gov

  2. Today’s Presentation • Overview of Statistics of Income (SOI) • Dealing with Missing Data • Recent Innovations • Future Plans

  3. What Does SOI Do? • Primary source of U.S. tax data • Data from 110 tax returns and information documents • Test and correct data collected during administrative processing (IRS Masterfile) • Collect extensive additional data from forms, schedules and attachments • Most projects collect data from samples • Products • Micro data files for U.S. Treasury Department & Congress • Public-use files • Tables and analysis (www.irs.gov/taxstats)

  4. SOI Data Collection Systems • Maintains computer network separate from main IRS processing • Data collection takes place in IRS Submissions Processing Centers • Graphical User Interface (GUI) systems based in ORACLE • Data tested for internal consistency • Post-edit processing overseen by headquarters’ staff

  5. Three Major SOI Programs • Individual Income Tax • Filed by individuals and married couples to report most forms of personal income • 133 million returns filed in 2006 • Corporation Income Tax • Filed by incorporated businesses to report income from parent corporation and subsidiaries • 2.5 million returns filed in 2006 • Tax-exempt Organizations • Annual information returns report assets, income, expenses • 833,000 returns filed in 2006

  6. Missing Data – Unit Nonresponse • Causes • Extensions/late-filed returns • Tax evasion • Strategies • Update values from prior year using survey responses • Utilize records for recent prior years filed during the selection period

  7. Missing Data – Item Nonresponse • Causes • Taxpayer neglects to provide attachments • Paper return is being used by another IRS function • Strategies • Use IRS Masterfile data for key values • Impute values based on existing data and information provided on prior and/or subsequent return • Surveys and direct contact with preparers

  8. What’s New? • Digital images of tax returns • Electronic filing • Automated error correction/imputation routines

  9. Digital Return Images • In 1998 SOI began scanning operations • Images stored in Tagged Image File Format (TIFF) • In 2006, imaged more than 71.5 million pages from 30 different tax and information returns • Many users: • SOI headquarters staff • SOI edit operations • IRS Functions • General Public (tax-exempt organizations only)

  10. Split-Screen Edit Systems • Combines scanned image and GUI edit system on a single 24 inch wide-aspect monitor • Image displayed using Adobe Acrobat or specially adapted ORACLE programs • Image and edit systems are synchronized • Online access to instructions, dictionaries, other tools

  11. Split-Screen Edit Systems • Positive feedback from editors • Slight overall improvement in productivity and quality • Images available to geographically disbursed work force • Reduced storage of paper documents • Reduced impact on other IRS functions

  12. Electronic Filing of Tax Returns • 2004 Modernized electronic filing (MeF) began • Uses Extensible Markup Language (XML) to capture: • Numeric and character strings supplied by taxpayer • Information tags • 2005 mandatory e-file for large business and tax-exempt organizations • 20.5% SOI sample of corporate income taxes • 13.5% SOI sample of tax-exempt organizations

  13. SOI Use of MeF Data • In 2006, SOI developed programs to render digital images from XML data • Edit returns using split-screen applications • In 2007, will populate ORACLE data tables directly with XML data • Editors will validate data, supply codes and allocate certain data items

  14. Electronic Filing of Tax Returns • Individual income tax returns • 1986 – E-file through paid preparers • 1992 – E-file from home computers allowed • 1994 – 98% of all filers eligible to e-file • 2006 – 73 million returns, or 54%, e-filed • Data stored in Tax Return Database (TRDB) • ASCII data, not tagged XML • 2010 – Scheduled for conversion to MeF

  15. SOI Individual Income Tax Program • Sample of returns processed differently depending on certain criteria • Edited returns • “Missing returns” • Forced closed returns

  16. Individual Processing Programs • Online editing system – editors transcribe, code and review any potential data discrepancies • Post Edit Reconciliation Process (PERP) – automated computer program which validates and adjusts data

  17. Edited Returns • Edited returns are processed through the online editing system by an editor, then reviewed using the PERP program • Prior to Tax Year 2004, all sampled returns which were not “missing” were manually edited • Currently only paper returns and electronically filed returns with specific characteristics are edited through online system

  18. “Missing Returns” • Each year, approximately 250 paper returns selected for the sample are not located • Limited IRS Masterfile data available • PERP program used to impute missing details of forms and schedules

  19. Forced Closed Returns • Automated processing of certain E-filed returns in the SOI sample • Bypass the online editing system and processed through the PERP program • Returns with possible discrepancies are reviewed by National Office analyst • Returns that pass all tests are considered “forced closed” and added to final data file

  20. Results from Forced Closing Returns • Tax Year 2004 – First year using automated closing of selected electronically filed returns • Total sample size – 200,295 returns • Electronically filed – 64,670 returns • “Forced Closed” – 18,193 returns • Editing hours saved – 1,400 hours

  21. Results from Forced Closing Returns • Tax Year 2005 – Second year of program, expanded criteria for returns eligible to be “forced closed” • Total sample size – 292,837 returns • Electronically filed – 114,897 returns • “Forced Closed” – 47,753 returns • Editing hours saved – 4,100 hours

  22. The Future - Data • More returns and information documents will be filed electronically • Optical Character Recognition or Intelligent Character Recognition will be used to capture data from paper-filed returns • Data will be available in real time • Enable larger sample sizes and increased use of population files

  23. The Future – Field Operations • Increased resources dedicated to resolving data inconsistencies as opposed to data transcription • Paperless environment – use of electronic data or digital images created from paper returns • Increased use of prior year data to identify and correct data anomalies

  24. The Future - Products • Improvements in technology and increased use of electronic filing will allow SOI to produce more data, more quickly and more efficiently • Increased sample sizes will allow small area estimates • Population files will allow for creation of ad hoc panels, linkage of data items across tax form types and research on infrequent data items

More Related