230 likes | 364 Vues
Confidentiality and the SARs. Update on SAR progress, and discussion of the disclosure work done for Scotland. Sam Smith s.smith@man.ac.uk. Update 2001 SARs. Newsletter published very recently: More delays Disclosure Control is ongoing by CAPRI
E N D
Confidentiality and the SARs Update on SAR progress, and discussion of the disclosure work done for Scotland. Sam Smith s.smith@man.ac.uk
Update 2001 SARs • Newsletter published very recently: • More delays • Disclosure Control is ongoing by CAPRI • Current estimate for Individual data to be with the SARs team in June • In-house access at ONS for users with urgent need.
England and Wales • For the release of 100% tables, England and Wales and Northern Ireland rounded small cell counts. • It is not possible to match between the SAR and the tables for England, Wales and NI.
Scotland • Scotland did not round their 100% tables. • As a result, there are counts of 1 in the tables. • If any of these individuals are present in the SAR, it is disclosive.
Background • The following work has been carried out in collaboration with the General Register Office for Scotland, by the SARs team at CCSR. • At time of writing, I have had no access to disclosive data. • There is no geography below Scotland level.
Population Uniques • Population Uniques are people who have one or more characteristics which are Unique in the Population. • Sample Uniques are people who are unique on one or more characteristics in the Sample.
Scale • There are 62 variables in both the SAR and 100% tables. • GROS are interested in Tri-variate tables. Only concerned with uniques. • We obtained 37,820 tables, covering all combinations of trivariate tables.
Request of the tables • An example request for input to their system was provided by GROS • We then replicated and modified it, one request for each table. • The tables arrived on 4 CDs, a month later.
An example table Space-Time Research 2001 ED Based OSD - Test 1 Table 1 Cars - Number of by Ever worked Indicator and Number of Rooms for Person No code required No code required No code required No code requiredNo code required Not applicable 01-02 03-04 05-06 7+ None - 53,323 421,443 232,335 18,719 One - 33,839 577,499 759,187 188,235 Two - 6,104 174,884 499,420 368,657 Three - 772 20,029 83,915 84,619 Four or more - 222 4,622 20,353 29,984 Communal establishment 50,485 - - - - • Cars - Number of by Ever worked Indicator and Number of Rooms • Only “No Code Required” shown for Ever Worked.
A Bigger Example TableAge, Industry, Occupation • Add table here
Analysis • Custom software written to parse each table, and list the file, variables and values locations of all uniques. • List the Uniques. • There are 2.4 million of them.
Implementation • Step by Step process. • Keep intermediate steps. • Keep It Simple.
Target • The Scotland Specification is as compatible as possible with the England and Wales specification. • Use recodes to reduce the unique count to a level where they can be dealt with on an record by record basis.
Simple Suppression of Uniques • All records with uniques must be perturbed. • Approximately 96% of Uniques will be immediately suppressed by virtue of the sample being 4%. • There are also reductions because of differences in the specifications.
Recodes • Variables were recoded to coarser categories. • Some used to aid E&W disclosure work • including: Age, Hours of Work, Industry + others • At time of writing, Occupation is the only additional recode for Scotland.
Running the recodes. • The previous slide represents 6 weeks of iterative work. • Each recode had the uniques analysis run, producing a list of uniques.
Moving forwards • We now have a slightly more restrictive specification for Scotland. • Age recoded to between 2 and 5 year bands (for age 16+) (possibly also for EWNI) • Occupation in ?? categories • Industry in 15 categories (applied to EWNI) • Hours of Work banded (applied to EWNI)
So far… • Everything has been done on publicly accessible data. • The above process needs to be rerun on the SAR to find Sample Uniques • This requires access to the disclosive microdata.
Future Work • The 38,720 tables will be recreated for the records in the sample. • The lists of Population Uniques and Sample Uniques will be compared. • Where there is a Population Unique in the Sample, it will be flagged.
Applying this to the Microdata • All the Population Uniques in the Sample will be peturbed by ONS. • The method of peturbation will be the same as done for England, Wales and NI records. • This method is likely to involve PRAMM. Discussion paper available from the SARs website?
The 100% Tables • The 37,820 tables requested cost £2,000 - paid for by the SARs project. • They will be made available to registered SARs/Census users for use in research.
And Finally…. • Slides will be available on the seminars webpage tomorrow. • Any questions?