1 / 29

Introduction to the BioMart API

Introduction to the BioMart API. BioMart APIs. Biomart_plib - Objected Oriented Perl interface. Biomart_plib Architecture. Object Oriented Perl Based API to BioMart Datasets Uses XML configuration shared by all BioMart Software. Query logic. Configuration logic.

urbana
Télécharger la présentation

Introduction to the BioMart API

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to theBioMart API

  2. BioMart APIs • Biomart_plib - Objected Oriented Perl interface

  3. Biomart_plibArchitecture • Object Oriented Perl Based API to BioMart Datasets • Uses XML configuration shared by all BioMart Software

  4. Query logic

  5. Configuration logic

  6. my $confFile = "/home/user/martRegistryFile"; my $initializer = BioMart::Initializer->new(‘registryFile’=>$confFile); my $registry = $initializer->getRegistry; Optional Initializer parameters: ‘action’ => ‘clean’ - replace the dataset configurations stored on the local file-system with those from the database and build a new, clean registry object ‘action’ => ‘update’ - replace any file-system dataset configurations modified since the last retrieval with the database copies and build a new registry object Default behaviour with no action specified is to generate the registry object using the cached file-system configurations if they exist, otherwise retrieve them from the database. Initializing API script

  7. Optional Initializer parameters (cont) ‘mode’ => ‘lazyload’ - only keep a certain number of dataset configurations in memory at once for low memory machines and future scalability Default behaviour with no mode specified is to keep all configurations in memory. Initializing API script

  8. my $query = BioMart::Query->new(‘registry’ => $registry ‘virtualSchemaName’ => ‘default’); $query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id'); or with optional virtualSchema and interface settings: $query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id’, ’default’,’default’); $query->addFilter('hsapiens_gene_ensembl','chromosome_name',['1']); $query->addFilter('hsapiens_gene_ensembl','hgnc_symbol',['FGFR1','IL2','DERL3']); Building Query

  9. my $query_runner = BioMart::QueryRunner->new(); $query_runner->execute($query); $query_runner->printResults; Executing query and printing results

  10. Print formatted header: $query_runner->printHeader; Print just first 20 results: $query_runner->printResults(20); Change the formatter from tab-separated default before execute the query: $query->formatter(‘FASTA’); The formatter has to have a corresponding module in lib/BioMart/Formatter implementing the FormatterI.pm interface (eg) CSV, TXT, GTF, XLS etc Executing query and printing results

  11. Multi dataset queries my $query = BioMart::Query->new('registry'=>$registry, 'virtualSchemaName'=>'default'); $query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id'); $query->addAttribute('hsapiens_gene_ensembl','ensembl_transcript_id'); $query->addAttribute('mmusculus_gene_ensembl','ensembl_gene_id'); $query->addAttribute('mmusculus_gene_ensembl','ensembl_transcript_id'); This is the equivalent of picking human as the main dataset in the web interface and mouse as the optional second dataset (ie) the human attributes appear first in the result table followed by the mouse attributes. Note that BioMart queries are currently restricted to two datasets maximum for performance reasons and query planning technical difficulties.

  12. Web services type access • To support GRID projects such as Taverna and other third party users who want to federate mart data without leaving a port to the database server openly accessible.

  13. Web services type access http://test.biomart.org/cgi-bin/martservice?query= <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Query> <Query virtualSchemaName = "defaultSchema"> <Dataset name = "hsapiens_gene_ensembl"> <Attribute name = ”ensembl_gene_id" /> <Attribute name = "chromosome_name" /> <ValueFilter name = "chromosome_name" value = "1"/> </Dataset> </Query>

  14. Web services type access Change format from default tab-separated format: <Query virtualSchemaName = "defaultSchema” formatter = “CSV”> <Dataset name = "hsapiens_gene_ensembl"> <Attribute name = ”ensembl_gene_id" /> <Attribute name = "chromosome_name" /> <ValueFilter name = "chromosome_name" value = "1"/> </Dataset> </Query>

  15. Web services type access Get count instead: <Query virtualSchemaName = "defaultSchema” count=“1”> <Dataset name = "hsapiens_gene_ensembl"> <Attribute name = ”ensembl_gene_id" /> <Attribute name = "chromosome_name" /> <ValueFilter name = "chromosome_name" value = "1"/> </Dataset> </Query>

  16. Web services type access Multi-dataset query: <Query virtualSchemaName = "defaultSchema"> <Dataset name = "mmusculus_gene_ensembl"> <ValueFilter name = "chromosome_name" value = "1"/> </Dataset> <Dataset name = "hsapiens_gene_ensembl"> <Attribute name = ”ensembl_gene_id" /> <Attribute name = "chromosome_name" /> <ValueFilter name = "chromosome_name" value = "1"/> </Dataset> </Query>

  17. Web services type access (1) Recover the registry file: http://test.biomart.org/cgi-bin/martservice?type=registry (2) Recover the datasets available for a mart: http://test.biomart.org/cgi-bin/martservice?type=datasets& virtualSchema=default&mart=ensembl (3) Recover the filters available for a dataset: http://test.biomart.org/cgi-bin/martservice?type=filters& virtualSchema=default&dataset=hsapiens_gene_ensembl (4) Recover the attributes available for a dataset: http://test.biomart.org/cgi-bin/martservice?type=attributes& virtualSchema=default&dataset=hsapiens_gene_ensembl

  18. MartJ • Java Interface to Biomart Datasets • Uses XML configuration shared by all BioMart Software

  19. import org.ensembl.mart.lib.config.RegistryDSConfigAdaptor; URL confURL = null; try { confURL = InputSourceUtil.getURLForString(“data/defaultMartRegistry.xml”); } catch (MalformedURLException e) { throw new ConfigurationException("Warning, could not load " + “data/defaultMartRegistry.xml” + " file\n"); } RegistryDSConfigAdaptor adaptor = new RegistryDSConfigAdaptor(confURL, false, false, false); RegistryDSConfigAdaptor

  20. import org.ensembl.mart.lib.config.DatasetConfig; DatasetConfig config = adaptor.getDatasetConfigByDatasetInternalName( "hsapiens_gene_ensembl", "default" ); DatasetConfig

  21. import org.ensembl.mart.lib.Query; Query query = new Query(); //query needs some information from the DatasetConfig query.setDataSource(config.getAdaptor().getDataSource()); query.setMainTables(config.getStarBases()); query.setPrimaryKeys(config.getPrimaryKeys()); Query

  22. Import org.ensembl.mart.lib.config.AttributeDescription; import org.ensembl.mart.lib.FieldAttribute; AttributeDescription adesc = config.getAttributeDescriptionByInternalName("gene_stable_id"); query.addAttribute(new FieldAttribute( adesc.getField(), adesc.getTableConstraint(), adesc.getKey() ) ); FieldAttribute/AttributeDescription

  23. There are three types of Filter that can be added to the query, both are created using the attributes of a FilterDescription A. BasicFilter B. BooleanFilter (but watch for the two boolean 'flavors') C. IDListFilter Filter/FilterDescription

  24. import org.ensembl.mart.lib.config.FilterDescription; FilterDescription fdesc = config.getFilterDescriptionByInternalName(“chr_name”); FilterDescription

  25. import org.ensembl.mart.lib.BasicFilter; //The config system actually masks alot of complexity //with regard to filters by requiring the internalName //again when calling the getXXX methods query.addFilter(new BasicFilter( fdesc.getField(name), fdesc.getTableConstraint(name), fdesc.getKey(name), "=", "22" ) ); BasicFilter

  26. import org.ensembl.mart.lib.BooleanFilter; //note there are different types of BooleanFilter //"boolean" and "boolean_num" if (fdesc.getType(name).equals("boolean")) query.addFilter(new BooleanFilter( fdesc.getField(name), fdesc.getTableConstraint(name), fdesc.getKey(name), BooleanFilter.isNULL ) ); else //”boolean_num” query.addFilter(new BooleanFilter( fdesc.getField(name), fdesc.getTableConstraint(name), fdesc.getKey(name), BooleanFilter.isNotNULL_NUM ) ); BooleanFilter

  27. import org.ensembl.mart.lib.IDListFilter; String[] ids = new String[] { “ENSG00000146556.4”, “ENSG00000197194.1”, “ENSG00000197490.1”, “ENSG00000177693.1” }; query.addFilter(new IDListFilter( fdesc.getField(name), fdesc.getTableConstraint(name), fdesc.getKey(name), ids ) ); IDListFilter

  28. import org.ensembl.mart.lib.Engine; import org.ensembl.mart.lib.FormatSpec; Engine engine = new Engine(); engine.execute( query, new FormatSpec(FormatSpec.TABULATED, "\t"), System.out ); Engine

  29. In the future, MartJ will be refactored to use the more flexible Architecture that we developed for the perl based software. Future of MartJ

More Related