110 likes | 198 Vues
Explore the vast biomedical literature to uncover novel gene relationships. Use NLP to parse text and find matches in unstructured publications. Automate relationship discovery for genes and diseases to create a global ontology.
E N D
Mining the Web: Discovering New Biomedical Knowledge Aly Khan
The Human Genome Project • Goal: Sequence the human DNA • Completed in 2003 • Joint effort between National Institutes of Health and Celera Genomics. • ~25,000 genes
25,000 Genes • What do they do? • How do they interact?
Finding context • Use vast amounts of published works to find novel relationships between genes • 17,000,000 records from more than 5,000 biomedical journals
On searching • Biomedical literature unbounded • Unstructured text in biomedical publications
Applications • NLP Parse text for matches using POS tags: • [Query noun phrase term] “is a” [noun phrase class] • hiv is a virus • [Noun phrase class] “is a” [Query noun phrase term] • genes such as 4fgf
Applications “The results demonstrated that KaiC interacts rhythmically with KaiA, KaiB, and SasA.” Ozgur et al. Path1: KaiC – nsubj – interacts – obj – SasA Path2: KaiC – nsubj – interacts – obj – SasA – conj_and – KaiA Path3: KaiC – nsubj – interacts – obj - SasA – conj_and – KaiB Path4: SasA – conj_and – KaiA Path5: SasA – conj_and – KaiB Path6: KaiA - prep_with - SasA – conj_and – KaiB
Contextual representation • PTEN is transcriptionally regulated by transcription factors such as p53 and Egr-1. • In response to DNA damage, the cell-cycle checkpoint kinase CHEK2 can be activated by ATM kinase to phosphorylate p53 and BRCA1, which are involved in cell-cycle control and apoptosis.
Goals • Creating a global ontology for genes, diseases, etc. • Automated discovery of relationships.