Methodology of Allocating Generic Field to its Details

Methodology of Allocating Generic Field to its Details Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007

Outline • Background Information on Tax Data • Objective • Current Methodology • Other Methodologies Considered • Comparison of the Methodologies • Future Work and Conclusions

Tax Data • Statistics Canada receives annual data from Canada Revenue Agency (CRA) on incorporated (T2) businesses • Tax data: • Balance Sheet • Income Statement • 88 different Schedules

Tax Data • About 700 different fields to report • Most companies provide only 30-40 fields • Only 8 fields are actually required by CRA (section totals) • Non-farm revenue • Non-farm expenses • Farm revenue • Farm expenses • Assets • Liabilities • Shareholder Equity • Net Income/Loss

Objective • To impute the missing detail variables • Why ? • Tax data users need detailed data (tax replacement project (TRP)) • Different concepts and definitions between tax and survey data • A subset of details linked to the same generic can be mapped to different survey variables (Chart of Account)

Challenges to meet • Methodology must • Work well for a large number of details • Be capable of dealing with details which are rarely reported and those which are frequently reported • Give good micro results for tax replacement, but also give good macro results when examined at the NAICS or full database level

First attempt to complete Tax Data • Edit rules • Outlier detection within a record • Deterministic edits (to ensure the record balances within section) • Review and manual corrections • Overlap between fiscal period • Negative values • Consistency edits between tax variables • Outlier detection between records (Hidiroglou-Berthelot) • CORTAX balancing edits • Deterministic imputation of key variables • Inventories • Depreciation • Salaries and wages

GDA Concepts • Corporation can use either generic or detail fields to report their results

GDA Concepts • Block is defined by a generic and its details • Generic field is not a total • Goal is to impute the most significant detail variables when a generic amount has been reported • GDA: Generic to detail allocation

Current method • Uses imputation classes based on industry codes and size of company • First 2 digits of NAICS (about 25 industries) • Three sizes of revenue (boundaries of 5 and 25 million) • Calculates ratios within imputation classes for each block • Uses all non-zero and non-missing details • Uses only details reported at least 10% of the time (5% for block General Farm Expense) • Assigns ratios to businesses with a generic

Current method • Originally proposed as a solution with good macro (aggregate) results • Now need good micro (business) level results for TRP • Problems • Imputation classes are frequently not homogeneous in terms of distribution • A large number of small imputation classes

Other methods considered • Historic imputation method • Scores method • Cluster method

Historic imputation method • Assumes distributions of details are the same from one year to the next • Problems • A change in business strategies/properties will not be considered this way • Most businesses which report details in the previous year will report them also in the current year, leaving few businesses which could be imputed with this method (~5% on all blocks tested) • Requires use of another method for remaining businesses

Scores method • Uses response/non response models for each detail • Groups businesses into imputation classes on the basis of percentiles of response probability • Calculates ratios within imputation classes • Assigns ratios to businesses with a generic

Scores method Problems • Need to create a model for each detail • Difficult to resolve what to do in the case of blocks with many details (5 or more) which are frequently reported • This method was excluded due to it’s difficulty in coping with blocks with a moderate to large number of details

Cluster method • Divides businesses into imputation classes on the basis of response patterns to details • Uses clustering or dominant detail method • Uses discriminatory models (parametric or not) to assign businesses with generic to imputation classes • Calculates ratios within imputation classes • Assigns ratios to businesses with a generic

Cluster method • Problems • For certain blocks it can be difficult to find good variables on which to discriminate • Issue of how often clustering method and models should be reviewed

Comparing the methods • Estimate distributions of known data for year n from ratios calculated for year n-1 • Create a benchmark file • Reported details in years n-1 and n • Put all details into generic fields in yearn • Calculate ratios from businesses in year n-1 for all methods • Assign ratios to businesses in year n • Compare the results to the reported fields

Comparing the methods • Compare the results at the micro (businesses) and the macro (aggregate) levels • Compare true and estimated distributions

Comparing the methods • Macro statistics for the jth detail in the block

Comparing the methods • Micro Statistics • Median Pseudo CV for the jth detail and ith business in the block

Comparing the methods • Micro Statistics • Median Pearson Contingency Coefficient for the jth detail and ith business in the block • f values represent the marginal distributions • d2represents the degree of dependency (depends on n, r and c)

Comparing the methods • We show results for Block 8230: Other Revenue • This block has 20 details covering revenue distribution • Important for clients as used in many surveys • The scores method is not shown as it is difficult to implement with this many details

Comparing the methods

Results

Cluster methodology • Most blocks use dominant detail (attractor) x clusters to define the imputation classes • A business i belongs to cluster j of attractor x where x>50 if where is the total value reported by business i in detail j. If this statement is not true for any detail then the business is assigned to cluster j+1.

Cluster methodology • Distribution ratios to details are calculated for each cluster • Discriminatory models are then created (nonparametric for most blocks) to assign businesses with a generic • Use variables on industry (NAICS), location (province), size (revenue, log revenue), details and totals of details in other blocks

Cluster methodology • Generic amounts are assigned to details in the following 3 ways • If generic amount and no details reported then ratios are assigned as calculated • If generic amount and all details with ratio greater than 0% are reported then ratios are assigned as calculated • If generic amount and some details but not all are reported, then ratios are pro-rated and generic is assigned only to details which were not reported

Cluster methodology • Gives better micro results • Improved data for tax replacement • Macro results remain similar to current methodology • Micro results are consistent year to year

Future work and conclusions • The cluster methodology will be implemented for reference year 2006 for the Income Statement • Model fitting and implementation for Balance Sheet will follow • Review of models and clustering methods as deemed appropriate

Contact Information / Coordonnées Jessica.andrews@statcan.ca Francois.brisebois@statcan.ca Nathalie.hamel@statcan.ca

Methodology of Allocating Generic Field to its Details

Methodology of Allocating Generic Field to its Details

Presentation Transcript

The Importance of Research and Its Methodology

Allocating Expenses

Allocating Spending

Methods of Allocating Costs -Overview

Allocating the Cost of Capital

Allocating Marketing Budgets

Allocating Revenues

Allocating Charges

Allocating resources to improve voting

Allocating Resources

Generic Revenue Driver Methodology

TEI@I Methodology and its Applications

Allocating Reliability

TEI@I Methodology and its Applications

Overview of Allocating Revenue Requirements

Agile Software Development and its Methodology

Allocating the Cost of Capital

Allocating Spending

Overview of Allocating Revenue Requirements

Methods of Allocating Costs -Overview

Allocating Resources to the Project

Details of Session Outlines and Methodology