1 / 1

Retrieve protein and gene mentions.

Mutation Grounding Algorithm Jonas B. Laurila 1 , Rajaraman Kanagasabai 2 and Christopher J. O. Baker 1 1 University of New Brunswick, Saint John. 2 Institute for Infocomm Research, Singapore. April 13th, 2010. Motivation

Télécharger la présentation

Retrieve protein and gene mentions.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mutation Grounding AlgorithmJonas B. Laurila1, Rajaraman Kanagasabai2 and Christopher J. O. Baker11University of New Brunswick, Saint John. 2Institute for Infocomm Research, Singapore.April 13th, 2010 Motivation Protein mutations are derived from in vitro experimental analysis and their impacts described in detail in scientific papers. Reuse of mutation impact annotations is an important subfield of bioinformatics for which mutation grounding is a critical step. We present a method for grounding of textual mentions from scientific papers describing mutational changes made to proteins. We distinguish between grounding of mutation entities to database entries and positionally correct grounding on amino acid sequences extracted from protein databases. Mutation Annotation Example (from within GATE) Conclusion Automated reuse of mutation impact information from documents is now an achievable milestone, given the respectable performance of our grounding algorithm. In combination with mutation impact extraction from sentences, the mutation grounding algorithm will facilitate the construction of unique datasets suitable as training material for predicting the impacts of genomic variations and the extraction of genotype-phenotype relations. Entity Recognition & Grounding Framework Grounding Workflow • Retrieve protein and gene mentions. • Retrieve all related accession numbers from MGDB, discard all but the most occuring. • Retrieve all organism mentions and discard accession numbers not related to retrieved organisms. • Retrieve all unique mutation mentions, normalize with MutationFinder and try to fit as many as possible onto the sequences corresponding to the accession numbers still left. • The accession number and corresponding sequence on to which most mutations are grounded is now considered as the correct one for the entire document. Grounding Algorithm Evaluation To evaluate the method for mutation grounding a gold standard corpus was built using the COS MIC database. Three target proteins/genes were considered, PIK3CA, FGFR3 and MEN1. Full-text papers containing more than one single point mutation and only about one single gene were chosen, with a total number of 63 documents. Acknowledgements -New Brunswick Innovation Foundation -NSERC Discovery Grant awards to Christopher J. O. Baker Performance

More Related