280 likes | 295 Vues
Using ontologies to make sense of unstructured medical data. Nigam Shah, MBBS, PhD nigam@stanford.edu. NCBO: Key activities. We create and maintain a library of biomedical ontologies. We build tools and Web services to enable the use of ontologies and their derivatives.
E N D
Using ontologies to make sense of unstructured medical data Nigam Shah, MBBS, PhD nigam@stanford.edu
NCBO: Key activities • We create and maintain a library of biomedical ontologies. • We buildtools and Web services to enable the use of ontologies and their derivatives. • We collaborate with scientific communities that develop and use ontologies.
Download • Traverse • Search • Comment Ontology Services Views • Create • Download • Upload Mapping Services http://rest.bioontology.org • Tree-view • Auto-complete • Graph-view Widgets Annotation Term recognition Fetch “data” annotated with a given term Data Access http://bioportal.bioontology.org
Annotation service Process textual metadata to automatically tag text with as many ontology terms as possible. 90 million calls, ~700 GB of data
Resource index Won 1st prize at the 2010 Semantic Web Challenge @ ISWC Pubmed Abstracts Adverse Events (AERS) GEO : Clinical Trials Drug Bank
Creating Lexicons Sentence in Clinical Note – 1 : : : Sentence in Clinical Note – m Frequency counter Term – 1 : : : Term – n Syntactic types Frequency
Annotation Analytics Analyzing tagged data for hypothesis generation in bioinformatics
Generic GO based analysis routine Genome Study Set • Get annotations for each gene in a set • Count the occurrence of each annotation term in the study set • Count the occurrence of that term in some reference set (whole genome?) • P-value for how surprising their overlap is. Reference set
Annotation Analytics Landscape SNOMED-CT ? NCIT ICD-9 MeSH : Drugs, Chemicals Cell Type Human Disease Gene Ontology Health Indicator Warehouse datasets Drug Sets Grant Sets Patient Sets Gene Sets Paper Sets
Open questions • Can we use something other than the GO? • Lack of annotations—even today, roughly 20% of genes lack any GO annotation. • Annotation bias—annotation with certain ontology terms is not independent of each other. • Lack of a systematic mechanism to define a level of abstraction.
Profiling a set of Aging genes 261 Age-related genes Genome Disease Ontology ~ 30% of genome
Using ontologies other than GO ERCC6 nucleoplasm PARP1 protein N-terminus binding ERCC6 <disease term?> PARP1 <disease term?>
Enrichment Analysis with the DO www.ncbi.nlm.nih.gov/pubmed/16107709 http://www.geneontology.org/GO.downloads.annotations.shtml {ERCC6, PARP1} PMID:16107709 {ERCC6, PARP1} {Cockayne syndrome, DNA damage} NCBO Annotator: http://bioportal.bioontology.org ERCC6 GO:0005654 PMID:16107709 ERCC6 GO:0008094 PMID:16107709 PARP1 GO:0047485 PMID:16107709 ERCC6 GO:0005730 PMID:16107709 PARP1 GO:0003950 PMID:16107709
Annotation Analytics on EMR data Analysis of tagged data from electronic health records
Profiling patient sets ICD9 789.00 (Abdominal pain, unspecified site) 86k patient Reports Patient records processed from U. Pittsburgh NLP Repository with IRB approval.
Generation of tagged data Text clinical note BioPortal – knowledge graph Creating clean lexicons Term – 1 : : : Term – n Frequency Diseases Annotation Workflow Term recognition tool NCBO Annotator NegEx Patterns Procedures Syntactic types Drugs Terms Recognized NegEx Rules – Negation detection Further Analysis Negation detection Cohort of Interest Terms form a temporal series of tags
Detecting the Vioxx Risk Signal Vioxx Patients (1,560) VioxxMI (339) MI Patients (1,827) ROR of 2.058, CI of [1.804, 2.349] The X2statistic has p-value < 10-7 ROR=1.524, CI=[0.872, 2.666] X2 p-value = 0.06816. RA Patients (14,079) p-value < 1.3x10-24
Annotation Analytics Landscape SNOMED-CT What questions can we ask? NCIT ICD-9 MeSH : Drugs, Chemicals Cell Type Aging EMRs Human Disease Gene Ontology Health Indicator Warehouse datasets Drug Sets Gene Sets Paper Sets Grant Sets Patient Sets
Associations and outcomes Enrichment What questions can we ask? Off-label Indications Side effects
Acknowledgements • Paea LePendu • Yi Liu • Srinivasan Iyer • Steve Racunas • Anna Bauer-Mehren • Clement Jonquet • Rong Xu • Mark Musen • NIH – NCBO funding • Mayo Team • Hongfang Liu • Stephen Wu • Sylvia Holland • Alex Skrenchuk
Mining Annotations of Grants, Publications • Publications from Medline • Only “Journal articles” Grants from 1972 to 2007 30 funding agencies