Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Photo taken by flickr/people/mfsarwar/ PowerPoint Presentation
Download Presentation
Photo taken by flickr/people/mfsarwar/

Photo taken by flickr/people/mfsarwar/

111 Vues Download Presentation
Télécharger la présentation

Photo taken by flickr/people/mfsarwar/

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Interoperability With BioMoby 1.0 It’s Better Than Sharing Your Toothbrush! Photo taken by http://flickr.com/people/mfsarwar/

  2. A brief history of BioMoby • Model Organism Bring Your own Database Interface Conference, Sept, 2001 (MOBY-DIC) • May 21, 2002 – Genome Canada Platform Award • May 25, 2002 – API Version 0.1 deployed, including object ontology serialization into XML • July 18, 2002 – First Moby Client (Gbrowse Moby) • June 9, 2003 – API Version 0.5 deployed • 2006 – Genome Canada Platform Award • 2007 - Version 1.0 API submitted for publication

  3. MOBY-DIC Chapter VII 7th Model Organism Bring Your-own Database Interface Conference Vancouver, BC, June 2007.

  4. The Core Ahab’s

  5. Wendy Richard Martin Mylah Eddie

  6. Andreas Paul Ivan Mark’s Screen…

  7. Create an ontology of bioinformatics data-types • Define a serialization of this ontology (data syntax) • Create an open API over this ontology • Define Web Service inputs and outputs v.v. Ontology • Register Services in an ontology-aware Registry • Machines can find an appropriate service • Machines can execute that service unattended • Ontology is community-extensible The BioMoby Plan

  8. Overview of BioMoby Transactions MOBY hosts & services Sequence Express. Protein Alleles … MOBY Central Align Phylogeny Primers Sequence Alignment Gene names

  9. Overview of BioMoby Transactions A sequence is a ___ That has these features __ What is a sequence? MOBY Central Align Phylogeny Primers Sequence Discovery of services That consume things LIKE sequences! Object ontology

  10. This is SCUFL – Simple ConceptualUnified Flow Language It is a complete record of everything you just did, and it can be saved for use in the Taverna workflow application that we will look at later…

  11. Pipeline discovery “on the fly” • No explicit coordination between providers • Dynamic discovery of ~appropriate Services • Automated execution of services

  12. Some BioMoby statistics

  13. Moby: Breadth • Namespaces (data types): 418 • Objects (data syntaxes): >561 • Service Types (analytical categories): 112 • Providers: ~50 active • Service Instances: ~1200 currently “alive” • In main Moby Central server in Canada • Others in “boutique” Moby registries serving specialized communities worldwide

  14. Moby: Clients • Gbrowse_moby (M Wilkinson) • PlaNet Locus_View (H Schoof, R Ernst) • Blue-Jay(P Gordon) • Taverna (T Oinn, M Senger, E Kawas) • MOWserv (INB, Spain) • Remora (S Carrere, J Gouzy, INRA) • MOBYLE (B Néron, P Tufféry, C Letondal, Pasteur Inst.) • SeaHawk (P Gordon)

  15. BioMoby in detail • MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries

  16. BioMoby in detail • MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries

  17. Moby Namespaces • A “Namespace” is a category of identifiers • NCBI has gi numbers (gi Namespace) • GO Terms have accession numbers (GO Namespace) • Namespaces indicate data’s semantic type. • GO:0003476  a Gene Ontology Term • gi|163483  a GenBank record • Though we are using the word “Namespace” correctly, it causes confusion! • “Namespace” in XML is tightly associated with an XML document and/or its syntax • In Moby, we are ONLY talking about data entities NOT THEIR SYNTAX

  18. BioMoby in detail • MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries

  19. BioMoby in detail • MOBY Data typing system: Semantic Type • MOBY Data typing system: Syntactic Type • Moby Registry Queries

  20. The MOBY Object Ontology • Syntactic types are defined by a GO-like ontology • Class name at each node • Edges define the relationships between Classes • GO used as a model because of its familiarity in the community • Edges define one of three relationships • ISA • Inheritance relationship • All properties of the parent are present in the child • HASA • Container relationship of ‘exactly 1’ • HAS • Container relationship with ‘1 or more’

  21. The Simplest Moby Data-Type <Object namespace=‘NCBI_gi’ id=‘111076’/> The combination of a namespace and an identifier within that namespace uniquely identify a data entity, not its location(s), nor its representation Object

  22. ISA DateTime ISA Float ISA Integer ISA String Moby Primitives <Integer namespace=‘’ id=‘’>38</Integer> Object

  23. ISA Integer HASA ISA Object String Describes the semanticrelationship between the Integer andthe Virtual Sequence ISA Virtual Sequence A Derived Data-Type <VirtualSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> </ VirtualSequence > <Integer namespace=‘’ id=‘’>38</Integer>

  24. HASA ISA Generic Sequence A Derived Data-Type <GenericSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”> ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String> </ GenericSequence > <VirtualSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> </ VirtualSequence > ISA Integer HASA ISA Object String ISA Virtual Sequence

  25. ISA DNA Sequence A Derived Data-Type <GenericSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”> ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String> </ GenericSequence > <DNASequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”> ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String> </ DNASequence > ISA Integer HASA HASA ISA Object String ISA ISA Virtual Sequence Generic Sequence

  26. Legacy file formats • Containing “String” allows ontological classes to represent legacy data types • <NCBI_Blast_Report namespace=‘NCBI_gi’ id=‘115325’> • <String namespace=‘’ id=‘’ articleName=‘content’> • TBLASTN 2.0.4 [Feb-24-1998] • Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. • Sch&auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman • (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search • programs", Nucleic Acids Res. 25:3389-3402. • Query= gi|1401126 • (504 letters) • Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences • 336,723 sequences; 677,679,054 total letters • Searchingdone • Score E • Sequences producing significant alignments: (bits) Value • gb|U49928|HSU49928 Homo sapiens TAK1 binding protein (TAB1) mRNA... 1009 0.0 • emb|Z36985|PTPP2CMR P.tetraurelia mRNA for protein phosphatase t... 58 4e-07 • emb|X77116|ATMRABI1 A.thaliana mRNA for ABI1 protein 53 1e-05 • </String> • </NCBI_Blast_Report>

  27. Binaries – pictures, movies • Text-base64 is a Class that containsString • Binaries are base64 encoded and passed in classes that inherit from text-base64 • base64_encoded_jpegISAtext/base64ISAtext/plainHASAString • <base64_encoded_jpeg namespace=‘TAIR_image’ id=‘3343532’> • <String namespace=‘’ id=‘’ articleName=‘content’> • MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC • Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV • MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC • Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV • BAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUx • HTAbBgNVBAsTFENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVl • bWFpbCBSU0EgMjAwMC44LjMwMB4XDTAyMDkxNTIxMDkwMVoXDTAzMDkxNTIxMDkwMVowQjEf • MB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEfMB0GCSqGSIb3DQEJARYQamprM0Bt • </String> • </base64_encoded_jpeg>

  28. Extending legacy datatypes • With legacy data-types defined, we can extend them as we see fit • annotated_jpegISAbase64_encoded_jpeg • annotated_jpegHASA2D_Coordinate_set • annotated_jpegHASADescription • <annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’> • <2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> • <Integer namespace=‘’ id=‘’ articleName=“x_coordinate”>3554</Integer> • <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”>663</Integer> • </2D_Coordinate_set> • <String namespace=‘’ id=‘’ articleName=“Description”> • This is the phenotype of a ufo-1 mutant under long daylength, 16’C • </String> • <String namespace=‘’ id=‘’ articleName=“content”> • MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC • Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV • </String> • </annotated_jpeg>

  29. The same object… annotated_jpegISAbase64_encoded_jpegHASA2D_Coordinate_setHASADescription • <annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’> • <2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> • <Integer namespace=‘’ id=‘’ articleName=“x_coordinate”> 3554 </Integer> • <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”> 663 </Integer> • </2D_Coordinate_set> • <String namespace=‘’ id=‘’ articleName=“Description”> • This is the phenotype of a ufo-1 mutant under long daylength, 16’C • </String> • <String namespace=‘’ id=‘’ articleName=“content”> • MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC • Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV • </String> • </annotated_jpeg>

  30. <CrossReference> <Object namespace=“TAIR_Allele” id=“ufo-1”/> </CrossReference> <CrossReference> <Object namespace=‘TAIR_Tissue’ id=‘122’/> </CrossReference> The same object… annotated_jpegISAbase64_encoded_jpegHASA2D_Coordinate_setHASADescription • <annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’> • <2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> • <Integer namespace=‘’ id=‘’ articleName=“x_coordinate”> 3554 </Integer> • <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”> 663 </Integer> • </2D_Coordinate_set> • <String namespace=‘’ id=‘’ articleName=“Description”> • This is the phenotype of a ufo-1 mutant under long daylength, 16’C • </String> • <String namespace=‘’ id=‘’ articleName=“content”> • MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3 • Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1U • </String> • </annotated_jpeg>

  31. Cross reference types • Simple • A MOBY Object • Rich • Takes the form: • …Incidentally, this avoids the problem of reification that is experienced in RDF • <Object namespace=‘foo' id=‘12345‘/> • <Xref namespace='' id='' authURI='' serviceName='' evidenceCode='' xrefType=''> • ... Textual Description ... • </Xref>

  32. XML Schema? The Object Ontology allows new data-types WITHOUT new flatfile formats, and without having to understand e.g. XML Schema Minimize future heterogeneity Improve interoperability without requiring schema-to-schema mapping

  33. XML Schema? • Object Ontology terms have semantically rich names, but this is primarily for human intuition • DNA Sequence • Annotated_GIF • Object Ontology does not define the meaning of an object to the machine • No machine-readable semantics • It does define the representation • SYNTAX