C-BRASS Canadian Bioinformatics Resources as Semantic Services

C-BRASSCanadian Bioinformatics Resources as Semantic Services Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI)

Mandate • Expose Canadian bioinformatics Web resources in a unified and automatable manner using Semantic Web Services framework. • Bioinformatics data and tools will be easier to discover and utilize, and integrate to hasten discovery. • First widespread deployment of a grid-framework where the messages are “meaningful” to the machine, and can be interpreted/re-interpreted under a wide range of scenarios.

Goals • Utilize novel SWS technologies to expose Canadian informatics resources on the emergent Semantic Web • Create toolkits for semantically “lifting” legacy resources into a SWS framework • Create prototype applications demonstrating a variety of ways of constructing, utilizing, visualizing, and interpreting the services, analytical pipelines, and resulting semantically-enriched datasets.

Web Service Adoption The low uptake of modern Web integration frameworks by the bioinformatics community stems from two primary facets: • Challenges in implementing these solutions • A gap between the abilities of existing technologies and the needs and skills of the target end-user.

SOAP • Simple Object Access Protocol (SOAP) messaging only successful within well-defined, often project-specific situations. • Lack of Semantics" in the Web Service interface descriptions which precludes the automated discovery of appropriate services, and automated pipelining of data between those services.

Semantic Web Service (SWS) • Achieved modest level of automated interoperability due to limitations in the way the semantics of Web Services are modeled: • SWS frameworks are implemented to support legacy data representation frameworks, in particular XML and XML Schema. • SWS have annotated XML Schema components describing services based on "meaning" of various input and output fields.

Semantic Web Services (SWS) • Automating workflow construction and semantically validating the "sensibility" of the connections between services (often referred-to as Schema-mapping) • XML Schema is semantically opaque, Applying semantics to it through annotation is extremely limited; • semantically-annotated XML tag can have only one interpretation

SWS Frameworks describe: • Input and output data-structures • Operations of a Web Service. • BioMobyService Type ontology • a vocabulary describing analytical operations. • OWL-S and WSMO/WSML Process Model • Before and After • Transformations during that state-change. • Single-term semantics - too simplistic • Process Models too complex, - No adoption

In transition • Data on the Semantic Web is encoded in RDF, while data in most Web Service frameworks is encoded in XML • From XML/Schema-based to OWL/RDF-based data representation • SAWSDL W3C Rec in 2008 • inputs and outputs of Web Services can be described in terms of ontological models.

User Communities (I) • End-user community does not usually have a "process model" or "business model" in-mind when searching for a Service. • Biologists execute a BLAST alignment • NOT because they wish to run a sequence similarity matrix over their input data; • BUT because they are interested in finding sequences that are related to their input sequence by homology. • Key is the relationships between the input and output data.

Bioinformatics Community Needs: • New metadata, i.e. Bioinformatics Web Service annotations that describes the biological properties between input and output that are generated by that Web Service.

SADI facilitates novel data discovery, interoperability, and integrative behaviours that closely mirror the needs and expectations of our end-user community simply by indexing services based on this predicate. • Semantic Web data vs data derived from Web Service.

SADI simply comprises a set of standards-compliant conventions and suggested best-practices for data representation and exchange between Web Services that fully utilizes Semantic Web technologies. • SADI mandates the inclusion of a single required annotation in the Web Service metadata that describes the biological relationship ("predicate") that is created between the input and output data of that Service

SADI Web Service Discovery

hasProteinSequence Predicate-based web service invocation. Using the hasProteinSequence predicate in a query automatically invokes a web service capable of obtaining the amino acid sequence for UniProt entry P04637.

SADI: Standards-compliant recommendations for implementation • SADI consists of several bioinformatics services • SADI Services are stateless and atomic. • SADI Services consume and provide data via HTTP, POST and GET. • SADI Services consume and produce data in RDF format. • SADI Service interfaces are defined in terms of OWL-DL classes; • the property restrictions on these OWL classes define what specific data elements are required by the Service and what data will be provided by the Service, respectively. • Input RDF data • data is compliant / classifies into Input OWL Class - is "decorated" or "annotated" by the service provider to include new properties reflecting activities performed by the Web Service. • Output RDF data • is an instance of the OWL Class that defines the output of the service.

SADI Registry Predicate Map

What can it do ? • SADI provides the functionality to automatically and dynamically discover, access, and integrate relevant data from distributed, non-uniform data-sources using disparate ontologies. Key promises of the Semantic Web ! • SHARE implementation allows users to query over data that might not exist at the time they pose their query. A query-specific database is dynamically generated as a query is being processed; effectively, the database required to answer the question is automatically generated as a result of the question being posed.

Find Gene Ontology terms (biological process, cellular component, and molecular function annotations) for proteins associated with Parkinson's disease: PREFIX pred: <http://es-01.chibi.ubc.ca/~benv/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX keyword: <http://biordf.net/moby/Global_Keyword/>SELECT ?term ?nameWHERE { ?protein ont:hasTagkeyword:parkinson . ?protein pred:hasGOTerm ?term . ?term pred:hasTermName ?name}

Semantic Health And Research Environment (SHARE) prototype. SHARE connects SADI middleware to Pellet SPARQL query engine and DL Reasoner.

SADI Toolkit "RDFizing“ • Virtuoso Sponger: • Bio2RDF: Native Service Provision and "Wrapping" legacy CGI and WSDL • Seahawk: • Dashboard: Core SADI Service Codebase • SADI::Service::Core: • jSADI: Quality of Service Testing • myGrid/Moby unit-Test and the Testing Agent: Ontology Development Tools • Protege 4 and Top Braid Composer: Client Applications • Taverna: • SHARE: • IO Informatics Sentient Knowledge Explorer plug-in:

SADI Training Course Curriculum Target Audience - The target audience for the training sessions includes primary or secondary data / service providers as well as the full spectrum of bioinformatics students and professionals from academia and industry. Web Service Registries and Service Discovery: Service Ontologies: Workflow composition: SAWSDL: MyGrid: SADI 101 Bioinformatics Web Service Requirements: SADI Enabled services: SADI toolkit: • Syntactic Web vs. Semantic Web: • Interoperability: • Knowledge reprsentation Standards: • RDF 101 - • OWL 101 - • Ontology Editors and Ontology Design: • Inference and Reasoning: • Reasoning Engines: • Web Service Description Languages

Action Plan • Tier 1 involves active, hands-on migration of native resources to a Semantically-enabled Service. • Tier 2 involves “wrapping” resources from non-participating providers via Services hosted on C-BRASS servers. • Tier 3 involves on-site training in Semantic Web Service technologies, and support for their self-directed resource migration.

Success Criteria • Number of Services created/migrated, and their use by consumers worldwide; (Minimum 400 in Canada) • Number of software tools created, and their use by third-parties; • Number of Canadian HQP trained in construction of Semantic Web Services.

Deliverables • A fully-documented definition of the SADI Semantic Web Service framework, including submission of this to an appropriate standards body (e.g. OASIS or OMG) • A set of core ontologies describing properties and relationships for entities in the biomedical domain • A costing-model, for use by future Semantic Web Service providers, outlining the establishment and maintenance costs for the migration from legacy Web or Web Service resources to a Semantic Web Service framework.

C-BRASSCanadian Bioinformatics Resources as Semantic Services Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI)

C-BRASS Canadian Bioinformatics Resources as Semantic Services

C-BRASS Canadian Bioinformatics Resources as Semantic Services

Presentation Transcript

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

The Canadian Bioinformatics Resources – An Overview

Canadian Brass

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops