1 / 13

Disclosure scenario and risk assessment: Structure of Earnings Survey

Disclosure scenario and risk assessment: Structure of Earnings Survey. Daniela Ichim, Luisa Franconi Istat – DCMT – Methodology ichim@istat.it , franconi@istat.it. 1. Objectives of the anonymisation 2. Disclosure scenarios 3. Risk assessment 4. Confidentiality protection

yoshe
Télécharger la présentation

Disclosure scenario and risk assessment: Structure of Earnings Survey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Disclosure scenario and risk assessment: Structure of Earnings Survey Daniela Ichim, Luisa Franconi Istat – DCMT – Methodology ichim@istat.it, franconi@istat.it

  2. 1. Objectives of the anonymisation 2. Disclosure scenarios 3. Risk assessment 4. Confidentiality protection 5. Information content analysis Outline

  3. Requirements: Member States Dissemination policy (Nace, Citizenship, Number of Employees, etc.) Coherence Users High-priority variables: NACE, NUTS, ISCO Minimum level of detail (NACE 2digits, Nuts1, ISCO 2digits …) Kinds of analysis Estimating the difference on Annual Earnings between two categories of the regional detail (estimating differences between regional politics) Weighted totals variation Objectives MICRODATA FILE FOR RESEARCH

  4. Mimic the intruder knowledge and interest. POSSIBLE INTRUDER = RESEARCHER. No external register scenario No nosy colleague scenario Disclosure scenarios MICRODATA FILE FOR RESEARCH ONLY SPONTANEOUS IDENTIFICATION

  5. Key variables Structural variables: NACE, NUTS, SIZE Enterprisespontaneous identification A sampled enterprise is considered at risk when both population and sample frequencies are simultaneously below the given threshold.

  6. Enterpriseprotection Structural key variables are all categorical. Protection is achieved by recoding classes of the categorical key variable with the lowest priority: 1. Nace 2-digits 2. NUTS1 3. SIZE a) Recoding with respect to the population frequencies generates a lower information loss. b) If needed, recode another variable.

  7. information on the enterprise (Nace x Nuts x Size) social variables (Gender x Age) extremely high earnings related to large enterprises Employees spontaneous identification MICRODATA FILE FOR RESEARCH

  8. High AnnualEarnings: greater than the 99% quantile (T) for each combination of Nace, Nuts, Size, Gender, Age, AnnEarn the number of sampled employees with earnings greater than T was counted. If there was a single employee with such characteristics, it was considered at risk of identification. Employees at risk(use the scenario!)

  9. Only records of employees at risk of identification ought to be perturbed. Only numericalkey variables are perturbed. Employees: selective protection MICRODATA FILE FOR RESEARCH

  10. Controlled perturbation Weighted total variation inferior to 0.5%. Can be easily adapted to whatever stratification. Constrained regression

  11. User requirements: Information preservation Weighted totals Sampling weights Only key and confidential variables are modified. Information loss Statistical indicators (correlations, summary statistics) Order relationships Information content

  12. Confidentiality ensured, minimize the information loss. CONCLUSIONS Consider the dissemination features. Consider the data features.

More Related