Italian ITGS validation practices for Intrastat data

Italian ITGS validation practices for Intrastat data ADVANCED ISSUES IN INTERNATIONAL TRADE IN GOODS STATISTICS ESTP training course 3 April 2014 Maria Serena Causo, Luisa Ciardelli, Fernanda Panizon, Veronica Rondinelli, Maria Lidia Travaglia

Taxonomy for data editing procedures in the framework of an EU harmonization of the ITGS validation practices • Data validity editing procedures: detection and correction of formal errors (incorrect or missing codes or values, leading to failure in data processing) – high severity level; • Data completess editing procedures: detection of under-coverage or over-coverage problems (cumulative traded values appear to be much lower or higher than expected) – the severity level depend on quantitive indicators (observed deviation from the expected value) • Data credibility editing procedures: detection of internal inconsistencies at record level (e.g. value and net mass inconsistent, with unit values out of acceptance ranges) – the severity level depends on the robustness of the test used and on quantitative indicators on deviation from the expected value

IT Intrastat validation practices1. Data validity editing procedures Checks for data validity are performed when data received from the Custom Agency are loaded into the production database: • Records failing logical checks are automatically corrected only when the transformation of the «wrong» code into a «correct» one is unique (e.g. invalid CN codes are transformed in valid ones, when correspondence tables CN(&y-1)-CN(&y) allow a unique valid code identification). • All other cases of formally incorrect data are investigated and inserted in the data production database only after correction. • Imputation of some missing elements is performed (net mass from values/(unit value), partner country from the ISO code of partner VAT number)

2. Data completeness editing procedures Monitoring trade coverage for the most important PSIs (task performed immediately after first data delivery) A list of the most relevant PSIs to be monitored is build and maintained as follows At the beginning of the new reference year the list is renewed/refreshed, selecting: • The top 300 PSI according to total traded value of the previous year + • The top 300 PSI according to average traded value in the months of activity (in order to detect very important PSIs who did not trade in all the 12 months, e.g. new VAT codes) • The top 300 PSI according to the total traded value of the IV quarter of the previous year (in order to detect PSI recently increasing their traded volumes) • The top 300 PSI according to the maximum monthly observed traded value in the previous year (in order to detect PSI who can account for very localized trading peaks) In 2014 this procedure led to about 360 PSI for each flow (most of them already monitored in 2013), covering 36% of on total 2013 dispatches and 41% on arrivals. The initial list is investigated by an Intrastat expert (Maria Lidia Travaglia) for further fine-tuning During the year the list is maintained, keeping track of changes in VAT codes, merging or splitting events

Monitoring starts immediately after the first data delivery (around T+27) THE RULE: For each monitored PSI, the total value traded in the reference month is checked to be in line with the time series (the full series start at y-2). What is checked? • That the monitored PSIs have reported Intrastat (under-coverage) • That there is no under-coverage or over-coverage, i.e. the declared value is not much below the minumum recorded along the time series and not too much above the observed maximum (the observed deviation gives an assessment of the severity of the under-coverage or over-coverage) • That the MAD test indicator is not above 2 or below -2 (new proposed indicator) t_MAD = (value(ref. month)-median(value))/MAD MAD = median(|value –median(value)|)

THE ACTIONS TAKEN WHEN A RULE IS FOUND TO BE BROKEN: • Check that no VAT event occured (merging or splitting of economic activities, changes in VAT number) – data source Chambers of Commerce • Any occurred event is registered and taken into account in the next monitoring actions • If no event occured, in case of no Intrastat reporting, the PSI is contacted (Late report? Is it necessary to perform a manual data integrations?) • If no event occured, in case of over-coverage, look at the largest impact records submitted from the PSI for the reference month and eventually correct them if wrong (e.g. unit values far out of range)

Editing of «HIGH VALUES» for the first press release Editing for all records with value > 700,000 eur (covering about 30% on dispatches and 43% on arrivals) net mass > 10,000,000 kg suppl. unit > 10,000,000 units are manually inspected looking for failures in credibility checks: • Incoherence with the PSI typical trade • Large deviations stat.value / invoiced value • Outlier unit values

Further Credibility checks based on unit value acceptance intervals (methodology developed and implemented by Alessandra Nuccitelli, see Istat Working Papers n 15, 2012) Acceptance intervals are computed on data of the 24 previous months stratified according to: a) by flow, eight-digit commodity code, trade operator; b) by flow, eight-digit commodity code, country; c) by flow, eight-digit commodity code Robust editing parameters based on quartiles and medians of the log-transformed (symmetrized) distribution are computed for each strata (resistant fences Tuckey method), when at least 20 observations are available Fences (Q_1 – k * IQR, Q_3 + k *IQR) Fences computed at most detailed stratification level are used (when at least 20 observations in the stratum are present)

Further Credibility checks based on unit value acceptance intervals (2)(methodology developed and implemented by Alessandra Nuccitelli, see Istat Working Papers n 15, 2012) The record is flagged for manual revision when the associated unit value is out of the acceptance range AND the potential error is above a given threshold Potential error = |value – quantity * median(unit value)| (selection of the most influential errors)

Credibility checks based on unit weightUsing «technical» unit weight ranges for automatic correction of quantities In 2014, 2587 Cn codes require supplementary units . While unit values (proxy for prices) can grow/decrease on relatively small time scales, Unit weights= (net mass)/(supplementary unit) change only with changes in product characteristics (on longer time scale) ISTAT maintains the list of (minimum, average, maximum) unit weight The list was deeply reviewed in 2013

In data revision often the focus is on traded values, but often records with small values can lead to large biases at the dissemination level Chapter 22 Example:value net mass suppl. unitunitpriceunitweight (eur) (kg) (lt) (euro/lt) (g/lt) Record 1 500.000 125.000 125.000 4,00 1000 Record 2 500.000 125.000 125.000 4,00 1000 Record 3 500 125 12.500.000 0,00004 0,01 B TOTAL 1.000.500 250.125 12.750.000 0,078 19,62 Technical unit value paramenters are used to estimate net mass when not reported (simplification thresholds) and for automatically correcting net mass or supplementary units when unit weights are out acceptance ranges

Tools for detecting what variable is wrong (net mass or sup. unit?)FOCUS on HS2 22 (s.u. litre or litre pure alcohol) An indicator for errors in net mass from Extrastat data (DAU) • Typical systematic error: «Semi-net» mass (including bottle) is reported • Rule violation spotted by error indicator (computable from SAD data): 100*(gross weight – net mass)/net mass 50-60% when net mass is ok, 5-10% when net mass includes bottles The indicator, computed on Extrastat data (SAD), helped to asses the presence of similar systematic errors for the same trader in Intrastat data

Tools for detecting what variable is wrong (net mass or sup. unit?)FOCUS on HS2 22 (s.u. litre or litre pure alcohol) An indicator for errors in supplementary units (litre case) • Typical systematic error: Number of bottles reported instead of liters • Rule violation spotted by error indicator: the histogram of the ratio grams/(suppl. unit) shows additional peaks around 250, 500, 750, 1500, 3000, 4500, 5000 (there should be a single peak at 1000!). Such peaks mean that instead of liters, number of bottles of size ¼ lt, ½ lt, ¾ lt, 1.5 lt, 3 lt, 4.5 lt, 5 lt are reported. • Action: correct the supplementary unit (forced to be = net mass)

HS2 22: Supplementary unit error indicator

Italian ITGS validation practices for Intrastat data

Italian ITGS validation practices for Intrastat data

Presentation Transcript

Data Validation

DATA VALIDATION

ITGS Security

IB ITGS

ITGS Hardware

ITGS

Data Validation

COMMAND TERMS FOR ITGS

Irregular Warfare Modeling/Data Validation Best Practices

ITGS - introduction

DP ITGS

IB-ITGS

Validation Data

ITGS - introduction

ITGS - introduction

ITGS

ITGS