Developing i2b2 Ontologies for the Long Haul

Developing i2b2 Ontologies for the Long Haul Lori Phillips, MS Partners HealthCare Systems, Inc April 25, 2012

National Centers for Biomedical Computing

What is i2b2? • Software for explicitly organizing and transforming person-oriented clinical data in a way that is optimized for research • Allows integration of clinical data, trials data, and genotypic data • A portable and extensible application framework • Modular software architecture allows additions without disturbing core parts • Available as open source at https://www.i2b2.org

Academic Health Centers (does not include AHCs that are part of a CTSA): Arizona State University City of Hope, Los Angeles Georgia Health Sciences University, Augusta Hartford Hospital, CN HealthShare Montana Massachusetts Veterans Epidemiology Research and Information Center (MAVERICK), Boston Nemours Phoenix Children's Hospital Regenstrief Institute Thomas Jefferson University University of Connecticut Health Center University of Missouri School of Medicine University of Tennessee Health Sciences Center Wake Forest University Baptist Medical Center HMOs: Group Health Cooperative Kaiser Permanente International: Georges Pompidou Hospital, Paris, France Hospital of the Free University of Brussels, Belgium Inserm U936, Rennes, France Institute for Data Technology and Informatics (IDI), NTNU, Norway Institute for Molecular Medicine Finland (FIMM) Karolinska Institute, Sweden Landspitali University Hospital, Reykjavik, Iceland Tokyo Medical and Dental University, Japan University of Bordeau Segalen, France University of Erlangen-Nuremberg, Germany University of Goettingen, Goettingen, Germany University of Leicester and Hospitals, England (Biomed. Res. Informatics Ctr. for Clin. Sci) University of Pavia, Pavia, Italy University of Seoul, Seoul, Korea Companies: Johnson and Johnson (TransMART) GE Healthcare Clinical Data Services Where is it used? CTSA’s • Boston University • Case Western Reserve University (including Cleveland Clinic) • Children's National Medical Center (GWU), Washington D.C. • Duke University • Emory University (including Morehouse School of Medicine and Georgia Tech ) • Harvard University (includingBeth Israel Deaconness Medical Center, Brigham and Women's Hospital, Children's Hospital Boston, Dana Farber Cancer Center, Joslin Diabetes Center, Massachusetts General Hospital) • Medical University of South Carolina • Medical College of Wisconsin • Oregon Health & Science University • Penn State MIlton S. Hershey Medical Center • Tufts University • University of Alabama at Birmingham • University of Arkansas for Medical Sciences • University of California Davis • University of California, Irvine • University of California, Los Angeles* • University of California, San Diego* • University of California San Francisco • University of Chicago • University of Cincinnati (including Cinncinati Children's Hospital Medical Center) • University of Colorado Denver (including Children's Hospital Colorado) • University of Florida • University of Kansas Medical Center • University of Kentucky Research Foundation • University of Massachusetts Medical School, Worcester • University of Michigan • University of Pennsylvania (including Children's Hospital of Philadelphia) • University of Pittsburgh (including their Cancer Institute) • University of Rochester School of Medicine and Dentistry • University of Texas Health Sciences Center at Houston • University of Texas Health Sciences Center at San Antonio • University of Texas Medical Branch (Galveston) • University of Texas Southwestern Medical Center at Dallas • University of Utah • University of Washington • University of Wisconsin - Madison (including Marshfield Clinic) • Virginia Commonwealth University • Weill Cornell Medical College

Why use i2b2? • Cohort discovery • Enables and simplifies research cohort discovery across an institution’s large, heterogeneous clinical datasets • Hypothesis generation • Enables and simplifies analysis of data to support a hypothesis • Retrospective data analysis • Enables the retrospective analysis of data to support/refute claims.

i2b2 Workbench

Data Model • FACTS • The quantitative or factual data being queried • DIMENSIONS • Groups of hierarchies and descriptors that define the facts. • STAR SCHEMA • A single fact table surrounded by numerous dimension tables.

i2b2 Star Schema

Observation (fact table) Primary Keys Patient_num Distinct number for every patient Encounter_num Distinct number for every visit Concept_cd Distinct code for every concept Observer_cd Distinct code for every observer Start_date Date-time observation began Modifier_cd Code to modify concept_cd Instance_num Mechanism to group concept modifers

i2b2 Fact Table • In i2b2, an atomic fact is an observation on a patient. • Examples of facts • Diagnoses • Procedures • Lab data • Medications • Genetic data

i2b2 Dimension Tables • Dimension tables contain descriptive information about the facts. • Examples • Concept dimension describes the concepts stored in the concept_cd field. • Provider dimension contains information about the observer_cd field • Patient dimension contains information about the patient_num field • Visit dimension contains information about the encounter_num field • Modifier dimension contains information about the modifier_cd field

How does i2b2 use Ontologies? • By and large, the concepts stored in the fact table come from clinical coding systems or ontologies. • Largely dependent on data available to institution • Diagnoses ICD9/ICD10/SNOMED • Procedures CPT/ICD9 • Medications NDC/RXNORM • Lab results LOINC • Molecular/genomic data • Custom or project specific data • Ontologies are used to organize query terms (and concepts) hierarchically.

Metadata table • Query terms are stored in a separate metadata table. • There is a one-to-one mapping of terms in the metadata to concepts in the dimension table. • The structure of the metadata table is integral to both the visualization of the query terms (tree) and the query mechanism itself.

Structure of Metadata Table

i2b2 Metadata Root Level Categories • Terms with c_hlevel = 1 • Display name is c_name • Icon (folder or container) is determined by c_visualattributes • Example c_fullname: • \Diagnoses\

Query terms are visualized hierarchically in tree \Diagnoses\ 1 Respiratory system\ 2 Chronic obstructive diseases\ 3 Emphysema\ 4

Hierarchies form the basis of both the visualization of the terms and the query mechanism itself. Why are hierarchies so important for i2b2? select * from metadata where c_fullname like ‘\Diagnoses\Respiratory system\Chronic obstructive diseases\Emphysema\%’ and c_hlevel = 5

Structure of Metadata Table

Hierarchies in queries select patient_num from observation_fact where concept_cd IN (select concept_cd from concept_dimension where concept_path LIKE '\Diagnoses\Respiratory system\Chronic obstructive diseases\ Emphysema\%')

i2b2 Ontologies for the Long Haul • How do I create i2b2 metadata for a known ontology? • ICD-10 • What happens to my legacy clinical data when I have to move to ICD-10? • Merging ICD-9 with ICD-10 • How do I handle genomic metadata? • …. Custom metadata?

NCBO BioPortal ICD-10

Building an ICD-10 Ontology with NCBO services • Pull data from NCBO via REST services. • Reorganize information into i2b2 Metadata format bioportal/concepts/46302/all <data> <pageNum>1</pageNum> <numPages>1832</numPages> <pageSize>50</pageSize> <numResultsPage>50</numResultsPage> <numResultsTotal>91590</numResultsTotal> <contents class="org.ncbo.stanford.bean.concept. ClassBeanResultListBean"> <classBeanResultList> <classBean> <id>0-ICD10CM</id> <fullId>http://purl.bioontology.org/ ontology/ICD10CM/0-ICD10CM</fullId> <label>ICD-10-CM TABULAR LIST of DISEASES and INJURIES</label> <type>class</type> <relations> <entry> <string>ChildCount</string> <int>0</int> </entry> ……

Primary challenges • i2b2 Metadata depends upon hierarchical information • c_fullname, c_tooltip maintain the hierarchy from root to leaves Diseases of the respiratory system \ Chronic lower respiratory diseases \ Emphysema

Challenges.. • NCBO REST service that enables pull of concepts includes immediate parent/child info only • Hierarchy must be computed • <data> • <classBean> • <id>J43</id> • <label>Emphysema</label> • <relations> • <entry> • <string>SuperClass</string> • <list> • <classBean> • <id>J40-J47</id> • <label>Chronic lower respiratory diseases</label> • </classBean> • </list> • </entry> • </relations> • </classBean> • </data>

NCBO Extraction workflow Extraction Workflow Request to extract ontology NCBOREST XML ICD-10 Process Extracted Data i2b2 Metadata

Extracted ICD-10 terms

Released deliverables https://community.i2b2.org/wiki/display/NCBO

What about my legacy ICD-9 data? • Ideally we would like an i2b2 ontology that integrates ICD-9 into ICD-10.

Mapping Tool • Tool to verify/(re)assign ontology mappings.

Navigating the Mapping Tool Tree • Displays terms mapped from one ontology within hierarchy of another • Mapped terms are displayed adjacent to terms they are mapped to and appear in bold

Adding a new mapping • ICD9:269.3, Mineral deficiency should appear for ICD10:E63 Other nutritional deficiencies • Copy term ICD9:269.3

Adding a new mapping • Paste onto ICD10:E63 Other nutritional deficiencies

Move a mapping • Ascorbic acid deficiency (ICD9:267) can be moved down one level to Ascorbic acid deficiency (ICD10:E54) • Drag and drop down the term one level.

Unmap a mapping ICD9:416.8 Other chronic pulmonary heart diseases appears in two places: the one attached to ICD10:I27.2 appears incorrect and can be unmapped.

The Unmapped Terms List • Free form list of terms to be mapped • Locate term you wish to map to in the hierarchy tree. Drag from table to term in the tree. • If you make a mistake you can either reassign the mapped term within the tree or unmap it from tree. • Unmap will cause it to reappear in the unmapped terms list if the term has no other mappings.

Assigning an unmapped term • Drag from unmapped terms list • Drop onto term we are mapping to

Unmapping a term • Drag term from tree • Drop onto unmapped terms list

Search Unmapped Terms By Name

Search Unmapped Terms by Code

Mapped Terms Viewer

Search Mapped Terms By Code

Search Mapped Terms By Name

Merging Ontologies • Mapping tool provides a visualization of what the merged ontologies would look like • What if we could extract a single metadata table from this?

Request to integrate Integration Workflow MapperCell ICD9 into ICD-10 For each mapped ICD-9 terms, compute ICD-10 hierarchy ICD-10 merged with ICD9 terms Mapped ICD-9 terms Integration tool

How to handle genomic data • Ability to organize the variants for ease of navigation • Needs may differ between geneticist, physician, research scientist • Ability to query for the variant in the workbench • Genomic labs may report data differently • Define the variant so it may be reliably identified over time • Implication is that the identifier for the variant does not change over time or is maintainable.

How to (reliably) identify a genomic variant? Chr location, Nucleotide subst ? HGVS Name ? Gene name + flanking sequences ? All of them?? RS # ?

RS number Uniquely identifies a variant over time ….but…. Novel variants may not have rs number User may not want to submit to dbSNP

Gene name + flanking sequences Not guaranteed if gene has several isoforms EGFR

HGVS Name Uniquely identifies variant within a referenced and versioned accession and details the nucleotide substitution. NM_005228.3:c.2155G>T Nucleotide substitution RefSeq accession Position Coding DNA

Is there a common denominator in all of this? Yes … all ultimately describe variant location on a chromosome. Nucleotide substitution defines the physical manifestation of the variant. WE PROPOSE: HGVS name (n/t subst, positional info) Flanking sequences (a way to verify positional info) AS A WAY TO UNEQUIVOCALLY EQUATE TWO VARIANTS ACROSS DOMAINS ACROSS VERSIONS

Developing i2b2 Ontologies for the Long Haul

Developing i2b2 Ontologies for the Long Haul

Presentation Transcript

Preaching for the Long Haul

Women Equality and the Long Haul

Long-haul T ransport Protocols

Developing Ontologies and more

The ‘long haul’ to recovery

In It For The Long Haul

USA #1 Long Haul destination for the Scandinavian travelers.

Survival Tactics for the Long Haul

Green Chemistry: Chemistry for the Long Haul Sustainable Chemistry

Long haul 10G Ethernet

Developing Ontologies for Knowledge Management

Protecting Forests for the Long Haul

Training teachers for the long haul: the importance of wellbeing .

Optimization Models for Long-haul Freight Transportation

Introduction to fundamentals – the long haul

NoLo: No Long-haul provider…

Long Haul Trucking

Safety Tips for Long Haul Flights

Is modafinil alright for long haul use?

Last minute travel tips for Long Haul Flights

Long Haul Trailers

Starfish Is In It For The Long Haul