1 / 51

DIMACS Working Group on Privacy / Confidentiality of Health Data Rutgers University Center Piscataway, New Jersey Decemb

DIMACS Working Group on Privacy / Confidentiality of Health Data Rutgers University Center Piscataway, New Jersey December 10-12, 2003. Health Care Databases under HIPAA: Statistical Approaches to De-identification of Protected Health Information. Judith E. Beach, Ph.D., Esq.

salena
Télécharger la présentation

DIMACS Working Group on Privacy / Confidentiality of Health Data Rutgers University Center Piscataway, New Jersey Decemb

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DIMACS Working Group on Privacy / Confidentiality of Health Data Rutgers University Center Piscataway, New Jersey December 10-12, 2003

  2. Health Care Databases under HIPAA: Statistical Approaches to De-identification of Protected Health Information Judith E. Beach, Ph.D., Esq. Associate General Counsel, Regulatory Affairs Chief Privacy Officer Chair, Council on Data Protection and Council on Research Ethics

  3. Outline • 1.Evolution of De-identification Standards – HIPAA Privacy Regulation • 2.De-identification Standards for Health Information in Research • a. Safe Harbor • b. Statistician Method • )HIPAA Provisions • )Quintiles Experience and Methodology • c. Limited Data Set • 3.Preemption of State laws on De-identification Standards for Health Information • 4.Health Information Privacy - Cases and Controversies

  4. Evolution of De-Identification Standards in HIPAA Privacy Regulation

  5. 5 Federal Policy: De-Identification of Health Information • Government’s intent - to provide a balance of stringent standards flexible enough not to be a disincentive to use or disclose de-identified health information, wherever possible. • De-Identified health data is one of the best mechanisms for avoiding wrongful disclosure of Protected Health Information (PHI). SeeDraft (05/27/03) DHHS Policy and Procedure Manual “De-Identification Policy d11” (effective date 6/1/03) - applies to DHHS agencies: HIPAA covered health care components and Internal Business Associates

  6. 6 Federal Policy: Use of De-identified Health Data Rather than PHI for Research • “We [HHS] expressed the hope that covered entities, their business [associates] and others would make greater use of de-identified health information . . . when it is sufficient for the [research] purpose and that such practice would reduce the burden and the confidentiality concerns that result from the use of individually identifiable health information for some of these purposes.” [HHS, in final privacy rule, 65 Fed. Reg. at 82543 (Dec. 28, 2000), citing proposed privacy rule of Nov. 3, 1999]

  7. 7 HIPAA’s Jurisdiction • Individually Identifiable Health Information (IIHI): • A subset of health information, including demographic information, that identifies the individual or with respect to which there is a reasonable basis to believe the information can be used to identify the individual • Protected health information (PHI): • Means individually identifiable health information (IIHI = Health Information + Identifier) that is transmitted or maintained electronically, or transmitted or maintained in any other form or medium • An investigator who submits health claims would be a HIPAA covered entity (CE) • CE + Health Information + Identifier = PHI • CE + Identifier - Health Information = NOT PHI • Health Information + Identifier - CE = NOT PHI

  8. De-identification Standards for Health Information in Research

  9. 9 De-identified Health Information • Definition: health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual. [45 CFR § 164.514(a)] • The Privacy Rule permits de-identification of PHI so that such information may be used and disclosed freely, without being subject to the Privacy Rule’s requirements. • Once de-identified, the data is out of the Privacy Rule.

  10. 10 HIPAA De-identification Standards • Two methods for the de-identification of health information: • “Safe Harbor” -- remove 18 specified identifiers - intended to provide a simple, definitive method for de-identifying health information with protection from litigation • “Statistician Method” -- retain some of the 18 safe harbor’s specified identifiers and demonstrate the standard is met if person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods, e.g., a Biostatistician, makes and documents that the risk of re-identification is very small. [45 CFR § 160.514]

  11. 11 Limited Data Set • Final rule: added another method requiring removal of facial identifiers -- “Limited Data Set” • Under confidentiality agreements - for research, public health, and health care operations • Regarded as PHI - NOT de-identified • therefore, still subject to Privacy Rule requirements such as minimum necessary rule.

  12. Safe Harbor Method

  13. 13 Safe Harbor • Covered entities must remove all of a list of 18 enumerated identifiers and have no actual knowledge that the information remaining could be used alone or in combination to identify a subject of the information. • The identifiers to be removed include • direct identifiers such as name, address, SSN • indirect identifiers such as birth date, admission and discharge dates, and five-digit zip code • [45 CFR § 160.514(b)(2)]

  14. 14 Safe Harbor The safe harbor does allow for the disclosure of • All geographic subdivisions no smaller than a State, as well as the initial three digits of a zip code • IF the geographic unit formed by combining all zip codes with the same initial three digits contains more than 20,000 people • AGE, if less than 90, gender, ethnicity and other demographic information not listed.

  15. Names All geographic subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes Except for the initial three digits of a zip code if according to the currently available data from the Bureau of the Census: The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people are changed to 000; All elements of dates (except year) or dates directly relating to an individual, including: birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older; Telephone numbers; Fax numbers; Electronic mail addresses; Social security numbers; Medical record numbers; Health plan beneficiary numbers; Account numbers; Certificate/license numbers; Vehicle identifiers and serial numbers, including license plate numbers; Device identifiers and serial numbers; Web Universal Resource Locators (URLs); Internet Protocol (IP) address numbers; Biometric identifiers, including finger and voice prints; Full face photographic images and any comparable images; and Any other unique identifying number, characteristic, or code. 15 Safe Harbor’s 18 Identifiers

  16. 16 Sources of Authority • In Privacy Rule Preamble, HHS recognizes two sources of authority as to what constitutes such principles and methods for de-identification adequate for posting a de-identified database on the Internet [65 Fed. Reg. at 82,709-82,710 (Dec. 28, 2000)] • “Paper 22”: Statistical Policy Working Paper 22—Report on Statistical Disclosure Limitation Methodology • “The Checklist”: The Checklist on Disclosure Potential of Proposed Data Releases -“intended primarily for use in the development of public-use data products.” 16

  17. 17 Safe Harbor • BUT many researchers and other groups have complained that the Safe Harbor renders the de-identified data as virtually useless for research so that the result will be MORE research using PHI. • No dates of service, no patient initials, no date of birth • Can have “deltas” such as number of patient visits over time • However, the safe harbor was NOT designed for research, but to provide an approved method of de-identification for any purpose by any covered entity, regardless of sophistication. • For instance, such de-identified data would be deemed to be safely posted on the Internet.

  18. Statistician Method

  19. 19 Statistician Method For this method, the covered entity • must remove all direct identifiers • reduce the number of variables on which a match might be made • should limit the distribution of records through a “data use agreement” or “restricted access agreement” [65 Fed. Reg. at 82,709-710 (Dec. 28, 2000)]

  20. 20 Opinion of Statistician • Statistician must • determine that there is a “very small risk” of re-identification • after applying “generally accepted statistical and scientific principles and methods for rendering information not individually identifiable” • documents the methods and results of the analysis that justify such determination. [45 CFR 160.514(b)(1)]

  21. 21 Statistician Method • This method has been generally ignored by covered entities. • Who prefer a safe harbor approach with “safe” being the operative word. • Consider the Statistician alternative as too complicated.

  22. 22 Statistician Method: Quintiles Experience • An expert statistician calculated the statistical likelihood of re-identification IFall 18 safe harbor identifiers were removed, that is, the “de-identification probability.” • Then, the statistician calculated the likelihood of re-identification if certain dates of service of medical or pharmacy claims were retained • And rather than age or year of birth, which is allowed in the safe harbor, the month and year of birth was included.

  23. 23 Statistician’s Opinion • This calculated number, the “de-identification probability” served as a benchmark of a “very small risk of re-identification” against which the statistician method would be compared.

  24. 24 Analysis: Comparison of Both Methods To ensure the statistical likelihood of re-identification was comparable to that of the calculated safe harbor benchmark, the following data fields were made stricterthan as permitted by the safe harbor: • For all patients older than 85 years of age (rather than 90), the year of their birth modified to make them all 85 years old. • All five-digit patient zip codes truncated to first 3 digits and further merged so that no resulting 3 digit code has a total population of less than 200,000.

  25. 25 Factors Considered by Statistician In the analysis, the statistician pointed out the obvious: • The de-identified data received is conveyed under a confidentiality agreement, which specifically prohibits re-identification or further disclosure of the data except in statistically aggregated form. • The database is maintained on a physically and technically secure, password-protected server. 25

  26. 26 Statistician’s Opinion “Applying generally accepted statistical and scientific principles and methods for rendering information not individually identifiable, . . . I conclude that the risk is very small that the information . . . could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information. . . . In practice the actual reidentification probabilities are much, much lower . . . arguably de minimis.” 26

  27. 27 Statistician Method • It is clear that most persons who have reviewed the Privacy Rule have failed to appreciate the significance of the statistician opinion to de-identification, and, instead, have focused almost exclusively on the "safe harbor." • In particular, many have failed to understand the importance of the "restricted access" as it relates to the statistician opinion approach to de-identification.

  28. 28 Ensuring HIPAA Compliance All data handled is de-identified using a unique patient identifier that is irreversibly encrypted. Patient identifiable electronic healthcare claims (standard health claims data fields) Data Encryption Process Data Warehouse De-identified data * zip = 3 digit ** DOB = modified Upon completion of the de-identification process a unique patient identifier is created, which is irreversibly encrypted.

  29. 29 Pharmacy Data Medical Data HX Facility Data (UB-92) RX Pharmacy Data (NCPDP) MX Provider Data (HCFA 1500) • Anonymous Patient ID • Patient Age & Gender • Date Written • Date Filled • NDC Code • Quantity Dispensed • Days Supply • Refill Flag • Prescribing Physician • Pharmacy • Payor Type^ • Anonymous Patient ID • Patient Age & Gender • Diagnosis Codes (ICD9) • Procedure Codes (CPT) • DRG • Admit Date • Discharge Date • Physician/Provider ID • Location of Care • Payor Type • Anonymous Patient ID • Patient Age & Gender • Diagnosis Codes (ICD9) • Procedure Codes (CPT) • Service Dates • Physician/Provider ID • Location of Care • Payor Type Core Data Elements July ‘98 - to date Jan ‘98 - to date ^Note: Payor Type not available on all records

  30. 30 Physician Demographics • Specialty • Region • Number of years in practice • Prescribing volume • Type of practice • Number of HMO / PPO / IPA affiliations • % patient volume by insurance type • Physician race • Physician age

  31. 31 Patient Characteristics • Location of contact • Height and weight • Age • Gender • Race • Blood pressure • Cholesterol levels (total, HDL, LDL, triglycerides) • Insurance type • Physician reimbursement method (fee-for-service vs. capitation) • Smoker or non-smoker

  32. 32 Disease Entities • Visits (with and without drugs) • Visits per physician per year • Total patients seeking treatment • Newly diagnosed patients • Visit type (first vs. subsequent) • Referrals and referring specialty • Severity of condition • Tests ordered or completed during visit • Existing medical conditions not treated • Number of times seen and days since last visit • Number of patient drug requests for condition

  33. 33 Treatment Regimens • Dosage form, strength and signa • Formulary impact • Quantity prescribed and number of refills (mean and frequency) • Weighted diagnosis value • Dispensing instructions • Occurrences per physician per year • Therapy type: • New • First-line versus adjunct therapy • Drug replacement and reason • Continued

  34. 34 Treatment Regimens • Desired action • Concomitant drugs (to treat same diagnosis) • Concurrent drugs (regardless of diagnosis) • Drug issuance • Sample days of therapy (mean and frequency) • Prescribed days of therapy (mean and frequency) • Daily average consumption (DACON) • Non-drug therapy

  35. Limited Data Set (LDS)

  36. 36 HHS’ Solution: Limited Data Set • For research, public health, or health care operations purposes • Authorization not required • A limited data use agreement must be in place between the covered entity and the recipient of limited data set (LDS) [45 CFR §164.514(e)] “Data Use Agreements would only be needed for those public health, research, or health care operation uses and disclosures that are not otherwise permitted by federal or state laws.” [SeeDraft (05/27/03) DHHS Policy and Procedure Manual “De-Identification Policy d11”]

  37. 37 LDS = Still PHI • Regarded as PHI, that is, not de-identified data and, therefore subject to requirements for protection of PHI such as • Prohibits re-identification or any attempt to contact individuals by recipient • BUT re-identification code permitted for covered entity • Subject to minimum necessary standards • BUT no accounting of disclosures or IRB approval

  38. 38 Limited Data Set Specifications • May be useful for records-based research such as epidemiological and other population research • But may NOT be useful for patient recruitment • Because re-identification of individuals or attempt to contact individuals is prohibited by a third party even if by Researcher (without IRB or internal privacy board approval) unless the contact is made by the Covered Entity or the Covered Entity’s Workforce.

  39. Name Postal address information (other than city, state, zip code) Telephone number Fax number E-mail address Social Security Number Medical record / prescription numbers Health plan beneficiary numbers Account numbers Certificate / license numbers Vehicle identity / serial numbers Device numbers Web URL IP address Biometric identifiers (e.g., fingerprints, retinal scans) Full face similar photographic images 39 LDS: Remove 16 Identifiers [45 CFR §164.514(e)(2)]

  40. 40 LDS: Retain Indirect Identifiers • Five-digit zip code • Dates of service (e.g., admission / discharge) • Dates of birth and death • Geographic subdivision (e.g., state, county, city, precinct), but not street address

  41. 41 Statistical Method for Dummies “Limited Data Set” . . . the Statistician Method made easy.

  42. Preemption of State Laws on De-identification Standards for Health Information

  43. 43 Preemption of De-identification Standards - A View • HIPAA Statute and privacy regulation • Preemption of state law only if • The provision of state law relates to the privacy of individually identifiablehealth information • HIPAA Statute § 1178 AND 45 CFR §§ 160.202 - .204

  44. 44 Preemption of State Law: HIPAA Statute • Health information considered identifiable and, therefore, subject to all requirements of rule ONLY if “reasonable basis to believe that the information can be used to identify the individual.” • Exception to preemption - when states can assert contrary and more stringent definition of “individually identifiable health information” • But exception analysis does not apply to de-identified data

  45. 45 Preemption: Deidentification Standards • Thus, states would be preempted from enforcing a standard for deidentification that exceeds the “reasonable basis” definition of individually identifiable health information as established in HIPAA statute. • Note: in response to Quintiles’ written request, HHS responded by revising preemption section of the Rule to refer to “individually identifiable” health information rather than merely health information.

  46. Privacy Cases & Controversies:De-identified Health Databases

  47. 47 U.S. Controversy • Quintiles Transnational Corp. v. WebMD • No demonstrable violation of HIPAA or other privacy law by transmission and aggregation of deidentified health data • Inhibits additional state regulation of national electronic data system • Order of Judge Terrence Boyle. • Re de-identified data: “the Dormant Commerce Clause prevents the individual states from regulating the interstate transmission of data.” • [No. 5:01-CV-180-BO(3), U.S. EDNC Western Division]

  48. 48 UK Controversy • Regina v. Department of Health, Ex Parte Source Informatics Ltd. [Judge Latham, 4 All ER 185, May 29, 1999; Case No. CO\4490\97, Queen’s Bench Division] • Judge Latham dismissed applicants' application for a Declaration that a policy document issued in March 1996 by the Department of Health “The Protection [and] Use of Health Information.”

  49. 49 UK: Source Informatics: Overturned on Appeal • Court of Appeals: Simon Brown, Aldous and Schiemann LJJ: 21 December 1999 • Where a patient's identity was protected, it would not be a breach of confidence for general practitioners and pharmacists to disclose to a third party, without the patient's consent, the information contained in the patient's prescription form for marketing research purposes.

  50. 50 UK Health and Social Care Bill: Clause 65 • Department of Health included language in the Health and Social Care Bill that would have essentially reinstated the lower court’s opinion (Judge Latham’s) • After heavy lobbying in the House of Lords against Clause 65, the language was defeated.

More Related