1 / 1

Semantic Similarity Measures Across The Gene Ontology.

Semantic Similarity Measures Across The Gene Ontology. Relating Sequence to Annotation. P.W. Lord, R.D. Stevens, A.Brass, and C. Goble Department of Computer Science, The University of Manchester, M13 9PL, UK. p.lord@russet.org.uk. Validation

cleave
Télécharger la présentation

Semantic Similarity Measures Across The Gene Ontology.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Similarity Measures Across The Gene Ontology. Relating Sequence to Annotation.P.W. Lord, R.D. Stevens, A.Brass, and C. GobleDepartment of Computer Science, The University of Manchester, M13 9PL, UK. p.lord@russet.org.uk • Validation • Two proteins which have similar sequences should probably also have semantically similar annotation. • We tested this by BLAST searching all SWISS-PROT proteins, taking the top bit scores, and comparing to semantic similarity. • Abstract: • Bioinformatics Resources are rich in knowledge, but are often held as free text. • Ontologies provide a way of representing knowledge in a form which is computationally accessible • The Gene Ontology (GO) represents knowledge about:- • The molecular function of a gene product • The biological process it is involved in • The cellular compartment of which it is a part. • Can we ask a database for proteins with “semantically similar” annotation to a query protein? • We present, and validate a measure which enables us to measure semantic similarity, and show several uses for this measure. • Semantic similarity over the molecular function aspect is most strongly correlated with sequence similarity. • A similar experiment shows “Traceable Author Statement” associations are mostly tightly correlated with sequence similarity. • This results fit well with biological expectations, and therefore serve to validate the semantic similarity measure. • Information Content Measures • Originally by Resnik (1995) developed for WordNet (Fellbaum, 1998), but adaptable to GO. • Less frequently occurring terms are “more informative”. • To calculate:- • For each term count the number of occurrences of that term, or any children • Divide by the total number of terms to give a probability • Applications • We have developed two prototype applications • A simple search tool, which uses the similarity score for ranking • An annotation checker, which looks for high semantic similarity and low sequence similarity. • The annotation checker has identified several “misannotations”, and errors in GO. • The search tool, while primitive, appears to be producing results which intuitively appear “correct”. Future Work • We are currently investigating several other information content based measures, and their behaviour over the GO dataset. • We plan to offer a web based portal, to enable us to seek user feedback on our search tool. The Information Content For Each Node. • The similarity is then given • Where pms is the information content of any shared parents. • Acknowledgements • The GO curators, and SWISS-PROT annotators for helpful comments • The GO database, and API, and bioperl, were used during this work • This work was funded under EPSRC/BBSRC Bioinformatics Programme (Grant number BIF/10507) References C.Fellbaum (1998) WordNet:- an electronic lexical database. MIT Press P. Resnik (1995) Using information content to evaulate semantic similarity in a t taxonomy Proc. 14th Intl Joint Conf. On Artifical Intelligence pg 448-453 Morgan Kaufman.

More Related