350 likes | 552 Vues
The Generation Challenge Programme (GCP) Platform for Crop Research. Richard Bruskiewich and the rest of …. …The GCP SP4 team and Contributors. Theo van Hintum (WUR), GCP Subprogramme 4 Leader. IRRI-CIMMYT Crop Research Informatics Laboratory Graham McLaren
E N D
The Generation Challenge Programme (GCP) Platform for Crop Research Richard Bruskiewich and the rest of …
…The GCP SP4 team and Contributors Theo van Hintum (WUR), GCP Subprogramme 4 Leader IRRI-CIMMYT Crop Research Informatics Laboratory Graham McLaren Thomas Metz Martin Senger Ramil Mauleon Mylah Anacleto Michael Jonathan Mendoza Victor Jun Ulat Arllet Portugal Ryan Alamban Lord Hendrix Barboza Jeffrey Detras Kevin Manansala Jeffrey Morales Barry Peralta Rowena Valerio Nelzo Ereful CIP: Reinhard Simon Edwin Rojas ICRISAT: Jayashree Balaji ICARDA: Akinnola Akintunde NCGR: Andrew Farmer Gary Schiltz SCRI: Jennifer Lee David Marshall Cornell University: Terry Casstevens Pankaj Jaiswal Dave Matthews ACGT: Ayton Meintjes Jane Morris CIRAD: Manuel Ruiz Alexis Dereeper Matthieu Conte Brigitte Courtois Bioversity: Mathieu Rouard Tom Hazekamp Milko Skofic Raj Sood NIAS: Masaru Takeya Koji Doi Kouji Satoh Shoshi Kikuchi EMBRAPA: Marcos Costa Natalia Martins Georgios Pappas University of British Columbia: Mark Wilkinson GSC Bioinformatics Graduate Program, BC Cancer Agency: Benjamin Good James Wagner Guy Davenport Trushar Shah Kyle Braak Sebastian Ritter Yi Zhang Sergio Gregorio Joseph Hermocilla Michael Echavez Roque Almodiel Samart Wanchana Supat Thongjuea
Overview • Generation Challenge Programme crop informatics research and development • GCP platform architecture: • Domain model & ontology • Application development framework
Challenge Programme “I challenge the next generation to use new scientific tools and techniques to address the problems that plague the world’s poor” Dr. Norman Borlaug http://www.generationcp.org
What is it? • An international research programme established in 2003, projected to last 10 years, and hosted by the CGIAR with global partners from ARI and NARES • Research Themes Directed to Crop Improvement: • Genomics and comparative biology across species • Characterization of genetic diversity for allele mining • Gene transfer technologies • Five research subprogrammes, one of which is cropinformation systems development.
Challenge Programme Wageningen University Netherlands ICARDA Syrian Arab Rep. John Innes Centre UK Bioversity Italy Agropolis France CAAS China Cornell University USA NIAS Japan IRRI Philippines BioTec Thailand CIMMYT Mexico WARDA Cote d’Ivore ICAR India ICRISAT India CIAT Clombia EMBRAPA Brazil IITA Nigeria ACGT South Africa CIP Peru
GCP Research: from Genotype to Phenotype SP1: Allelic Mining SP3: Trait Synthesis SP2: Functional Assignment Genebank Advanced breeding lines as vehicles NILs, RILs Mapping pop. Mutants Genetic Resources Genomic annotation, Forward and Reverse Genetics, Gene arrays/gels Germplasm Genotyping & Phenotyping Marker-aided Selection/ Transformation Process Candidate genes Beneficial alleles Linked to Traits Value-added varieties Product
Integration across Diverse Crop Data Genetic Analysis • Inventory • Identification (passport) • Genealogy has Genotype has Phenotype Germplasm determines • Anatomical • Developmental • Field Performance • Stress Response determines • Genetic Maps • Physical Maps • DNA Sequence • Functional Annotation • Molecular Variation (Natural or Induced) Molecular Expression • Transcripteome • Proteome • Metabolome • Physiology • Location (GIS) • Climate • Day Length • Ecosystem • Agronomy • Stresses affects Environment
Crop Information Systems: the Next • Large, globally distributed consortium • Diverse research requiring a diversity of tools • Large data sets with diverse data types • Many legacy informatics systems and tools • Global data integration required… Key Issue: Interoperability
Some Basic GCP Research Objectives • Compile a list of germplasm meeting specific passport data criteria • Compile a list of genetic markers of interest from genetic and QTL maps • Retrieve genotypes of specified markers, for specified germplasm • Align gene expression data against QTL positional evidence to identify candidate gene loci for specified traits
Analyse source environment of germplasm Find germplasm genotyped with mapped markers Get candidate genes in map interval Select “interesting” candidate genes; get alleles Plot germplasm, genotype and phenotype on geographical maps Get genotype & phenotype of germplasm Get functional information about genes Get/analyse a genetic map Select adapted germplasm with favorable phenotype & alleles for further evaluation A Generalized GCP Crop Research Integration Work Flow Comparative Map & Trait Viewer (NCGR/ISYS) Germplasm Passport/ Phenotype/ Genotype Querybuilder DIVA-GIS Comparative (Functional) Genomics Tools Generation Challenge Programme Domain Model & Middleware Genetic Map Data Source(s) Germplasm Data Source(s) Genomics Data Source(s) GIS Data Source(s)
integrated databases and tools GCP Information Platform: User Perspective An environment that provides improved access to data and analysis tools applications
Data Registry internet Tapir MOBY, etc. middleware local database layer GCP Information Platform – Developers’ Perspective application layer
Generation CP Platform http://pantheon.generationcp.org
GCP Platform - General Architecture • “Model Driven Architecture” based on “platform independent” GCP scientific domain models, parameterized with controlled vocabulary (“ontology”) • GCP domain models mapped onto platform specific implementations. • Reference (Java) GCP platform application programming interface (API)
Semantics of the GCP Model Driven Architecture • GCP is trying to model the meaning (“semantics”) of the crop research world. • Semantics is found in the domain model at three distinct but interconnected levels: • System architectural level: general scientific semantics in terms of high-level object concepts (“object types”) and their global inter-relationships. • Entity level: attributes and behaviors internal to high-level object types. • Attribute level: attribute values of objects that range over data types: simple (e.g. identifiers, numbers), complex (other classes of entities) or ontology (such as Gene Ontology (GO) terms, for a gene product).
Layers of Semantics Object Model of the Scientific Domain… …Parameterized with Ontology 1 Phenotype 2 ranges over Plant Ontology 3 Observable Germplasm has an has a Attribute with a Value
GCP Domain Model Specification • High-level object types are specified with Unified Modeling Language (UML) and associated text narratives. • Major object classes are represented in the object model. More specialized object types are specified by subclassing major object types using ontology. • Reference model is coded by Eclipse Modeling Language managed with source code versioning and automatically compiled into other representations. http://pantheon.generationcp.org/demeter
Scope of GCP Domain Model & Ontology • Core models: generic concepts – identification, entities, features, organization, data management • Models heavily parameterized by ontology (e.g. entity and feature “type” attributes) • Scientific models: extends core model into specific scientific scopes relevant to GCP: • Germplasm data (including genetic resources passport) • Genomics including genotypes, maps, sequences and functional annotation. • Phenotype data • Environmental data (including geographical location)
GCP Ontology • Every attribute in the GCP domain model with data type SimpleOntologyTerm or subclass thereof, is an integration point for an external ontology. • External public ontology (e.g. GO, PO, SO) reused when available, and new ontology developed within GCP to fill gaps. • Ontology consolidated into GCP database based on GMOD Chado CV tables, indexed within platform using a GCP formatted identifier (that retains the source’s identifier).
GCP Domain Model Mappingsonto Platform Specific Implementations GCP Platform Java Middleware & Applications GCP Domain Model (UML/EMF) SOAP Web Services (BioMOBY, SoapLab, GDPC) XML Schemata: GCP Data Templates, BioCASE/Tapir GCP Ontology Database OWL/RDF Ontology: VPIN/SSWAP.info http://pantheon.generationcp.org/demeter
Reference GCP Platform API • PantheonBase: a relatively simply core Java Application Programming Interface (API) for software integration: • DataSource: query data resources, using simple, ontology-driven SearchFilter specifications • DataTransformer: computational input/output • DataConsumer: communicate data to viewers http://pantheon.generationcp.org
GCP Data Source Implementations • Direct Integration of relational databases (Spring HttpInvoker, Hibernate, JPA): • Developed for ICIS, GMOD Chado (beta) • Protocols: • Generalized Java Client to connect to BioMoby web services; Java support for GCP-compliant BioMoby web service provider development (beta) • Support for BioCase/Tapir data source integration (prototyped) • GCP-compliant GDPC data source (prototyped) • SSWAP/VPIN wrapper (under discussion) • Some other direct custom data source wrappers
Some GCP BioMOBY docs… http://moby.generationcp.org http://pantheon.generationcp.org/moby http://cropwiki.irri.org/gcp/index.php/MOBY_Rice_Network
GCP BioMoby Support – a Synopsis • MoSES + Dashboard developed (M. Senger). • GCP model specific BioMoby datatypes specified. • Java libraries partly developed for interconversion of GCP BioMoby data types to/from GCP domain model Java objects (Barboza). • GCP DataSource Java implementation developed for client side of BioMoby that maps GCP DataSourcefind() use cases onto BioMoby web services using a using XML configuration files (no coding). • Java design pattern for modular implementation of BioMoby web services that get their data from any GCP-compliant DataSource that supports a given find() use case.
(Partial) Inventory of 3rd Party Data Resources targeted for wrapping as GCP Data Sources
GCP Platform Implementations • Standalone workbench (“GenoMedium”) • Eclipse Rich Client Platform (RCP) • Web-based workbench (“Koios”) • AJAX, PHP, Java (server side), Java Web Start • NCGR Integrated SYStem (ISYS) • Direct tool integration (e.g. GCP MaxdLoad)
GCP Web-Based Search Engine Summary of query hits GCP semantics defined query List of items matched View details at 3rd party web site or in locally invoked 3rd party data viewer http://koios.generationcp.org
(Partial) Inventory of 3rd Party Analysis/Viewer Software being targeted for GCP Integration
GCP “Pantheon” Project in CropForge http://cropforge.org/projects/pantheon/
Closing Perspective • The GCP is a global consortium of 22++ crop research partners who need to share diverse large data sets and tools, in a globally distributed manner. • Given the scope and duration of the GCP, developers within the consortium embraced the task of developing public global informatics standards for interoperability and integration. • The effort is an open source, global community building exercise. • We welcome the participation of any and all interested scientists and developers who might wish to use and/or contribute to the further evolution and application of these standards.