Unified Ontology for Protein-Protein Interaction Data Integration and Analysis

An Ontology for Protein-Protein Interaction Data Karen Jantz CIS Honors Project December 7, 2006

Overview • Problem Statement • Objectives • Approach • Background • Methodology • Evaluation • Demonstration • Conclusion

Problem Statement • Several sources for protein-protein interaction data • Different schemata • Different purposes • Different strengths/weaknesses

Objectives • Unify the data • Enable data mining • Evaluate reliability of data across data sources • Gain new information about the entire data set • Enable others to easily add other data sources to the set

Approach: ontology • ontology – n. • that which exists(philosophy) • that which is represented (artificial intelligence) • A descriptive data model • Defines the entities and relationships within a domain • Based upon data • Human-readable

Approach: ontology Data integration • Enables simultaneous querying across multiple databases • Data transformation • Enables interchange between database formats • Data mining • Enables reasoning and learning over the entire data set

Background: Data Sources • DIP (Jing Xia) • Database of Interacting Proteins • Most reliable data set • Jing Xia • BIND (Abhijit Erande, Aaron Schoenhofer) • Biomolecular Interactions Network Databank • Very large data set • Contains interactions, molecular complexes, and pathways

Background: Data Sources • MINT • Molecular INTeractionsdatabase • experimentally verified protein interactions • Evaluates confidence level • IntAct • Not limited to binary interactions • Allows user submissions • mips CYGD • Munich Information Centerfor Protein Sequences: Comprehensive Yeast Genome Database • Limited to yeast • Focuses on sequencing

Background: Tools • Protégé • Open-Source Project • Graphical ontology editor • Interacts with OWL Reasoner • Detailed API for modifying ontologies programmatically

Background: Tools • Prompt • A Protégé Plugin • Enables ontology mapping • Enables ontology comparison

Background: Related Work • PSI-MI • Controlled vocabulary for PPI data • Not a proposed database structure • Decreases the strength of information • Helpful in defining relationships and keys

Methodology: Overview Web Interface Q: What interactions have been observed between with protein A? Q: What experiments give evidence for a given interaction? Unified Ontology Unified Data Set transformation DIP BIND MIPS MINT IntAct

Methodology: Design • Review the singular database schemata and determine strengths/weaknesses • View data files • Native formats • PSI-MI formats • Create a unified schema of the data sources • Create the unified ontology in Protégé • Create each singular database as a subset of the unified ontology

Protégé Screenshot

Methodology: Data Import • DOMParser • Load data from XML • Protégé-OWL API • Insert entities into singular databases

Methodology: Transformation • Use Prompt to create a mapping for each specific data source to the unified ontology • Use Prompt mappings to insert individuals from each singular ontology into the unified model

Methodology: Transformation • Duplicate Data • Need to fill in attributes on existing records • Write ‘Algorithm Plugin’ for Prompt to determine when individuals are the same

Prompt Screenshot - Mapping

Methodology: Query Interface • Export Protégé data into MySQL • Web interface for collecting data • Working with domain experts to determine useful views, queries

Evaluation • Performance • Transformation Time in Protégé • Query Time for Web Interface • Size • Minimize redundancy in data model • Minimize duplicate data

Evaluation • Correctness • Domain Experts • Dr. Brown, Dr. Wang • Maintain proper data relationships • Utility • Enrich data

Evaluation

Demonstration

Future Work • Complete transformations • Import data • Evaluate ontology • Add other databases to model

Conclusions • Adequate start • Needs improvement, evolution, more data sources • As the project matures, the ontology will be ready for use in the biological domain • Will be able to more easily gain information about protein-protein interactions

References • AAAI.org - AITopics: “Ontology” • http://www.aaai.org/AITopics/html/ontol.html • Protégé • http://protege.stanford.edu/overview/protege-owl.html • Prompt • http://protege.cim3.net/cgi-bin/wiki.pl?Prompt • PSI-MI • http://psidev.sourceforge.net/mi/xml/doc/user

References • BIND • http://www.bind.ca • DIP • http://www.dip.doe-mbi.ucla.edu • IntAct • http://www.ebi.ac.uk/intact/site/ • MINT • http://mint.bio.uniroma2.it/mint/Welcome.do • MIPS • http://mips.gsf.de/genre/proj/yeast

Q & A

Unified Ontology for Protein-Protein Interaction Data Integration and Analysis