1 / 21

EMBOSS as a DAS Client

EMBOSS as a DAS Client. Peter Rice pmr@ebi.ac.uk Mahmut Uludag uludag@ebi.ac.uk 3rd March 2011. EMBOSS: A quick introduction. European Molecular Biology Open Software Suite Open source package for sequence analysis ANSI C source code GPL licensed applications, LGPL libraries

theo
Télécharger la présentation

EMBOSS as a DAS Client

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EMBOSS as a DAS Client Peter Rice pmr@ebi.ac.uk Mahmut Uludag uludag@ebi.ac.uk 3rd March 2011.

  2. EMBOSS: A quick introduction • European Molecular Biology Open Software Suite • Open source package for sequence analysis • ANSI C source code • GPL licensed applications, LGPL libraries • 200+ applications • 100+ third party applications in 15 associated packages • Project started 1996 at Sanger Centre and HGMP • Now based at EBI • Release 6.3.0 15th July 2010 • Funded by UK-BBSRC and EMBL-EBI EMBOSS as a DAS Client

  3. EMBOSS history • Project started at Sanger Centre and SEQNET August 1996 • Alan moved from SEQNET 1997 (Wellcome funding) • Peter moved to Lion Bioscience 2000 (CCP11-BBSRC/MRC) • Peter moved to EBI 2003 • HGMP closed 2005: Alan+Jon moved to EBI • BBSRC funding (limited) 2006-2009 • BBSRC BBR funding 2009-2011 • Major new developments • New data types • New data sources • Built-in ontologies EMBOSS as a DAS Client

  4. EMBOSS command line interface • EMBOSS applications run from the command line • This is not the only interface • There are over 100 interfaces and packaged systems available • Web interfaces • Graphical user interfaces (GUIs) • Web services • All applications have a command definition file (.acd) • Defines all inputs, outputs, and other options • Read at startup • Contains all command line options with descriptions • Template for any other interface EMBOSS as a DAS Client

  5. EMBOSS command line example % antigenic Input protein sequence(s): uniprot:actb1_fugru Minimum length of antigenic region [6]: Output report [actb1_fugru.antigenic]: % antigenic uniprot:actb1_fugru -auto EMBOSS as a DAS Client

  6. EMBOSS ACD File integer: minlen [ standard: "Y" minimum: "1" maximum: "50" default: "6" information: "Minimum length of antigenic region" ] endsection: required section: output [ information: "Output section” type: "page” ] report: outfile [ parameter: "Y" rformat: "motif" multiple: "Y" taglist: "int:pos=Max_score_pos" ] endsection: output application: antigenic [ documentation: "Finds antigenic sites in proteins" groups: "Protein:Motifs" ] section: input [ information: "Input section” type: "page“ ] seqall: sequence [ parameter: "Y" type: “proteinstandard" ] endsection: input section: required [ information: "Required section” type: "page” ] EMBOSS as a DAS Client

  7. EMBOSS ACD File with EDAM Annotation integer: minlen [ standard: "Y" minimum: "1" maximum: "50" default: "6" information: "Minimum length of antigenic region" relations: "EDAM:0001249 data Sequence length“ ] endsection: required section: output [ information: "Output section” type: "page” ] report: outfile [ parameter: "Y" rformat: "motif" multiple: "Y" taglist: "int:pos=Max_score_pos" relations: "EDAM:0001534 data Peptide immunogenicity report“ ] endsection: output application: antigenic [ documentation: "Finds antigenic sites in proteins" groups: "Protein:Motifs" relations: "EDAM:0000201 topic Immunological analysis" relations: "EDAM:0000416 operation Epitope mapping“ ] section: input [ information: "Input section“ type: "page” ] seqall: sequence [ parameter: "Y" type: “proteinstandard" relations: "EDAM:0001219 data Pure protein sequence" relations: "EDAM:0000849 data Sequence record" relations: "EDAM:0002178 data 1 or more“ ] endsection: input section: required [ information: "Required section” type: "page” ] EMBOSS as a DAS Client

  8. Documentation & books Three books at typesetting stage. • Administrators’ Manual • Users’ Manual • Developers’ Manual Concomitant major revision of EMBOSS website. Automation of website content addition. Books to form basis of new website content. EMBOSS as a DAS Client

  9. EMBOSS: Sequences Uniform Sequence Address (USA): URL-style naming Derived from the familiar "VMS logical name" syntax used by SRS and GCG. database : entryname • embl : ecompa ID or accession can be used in this way • uniprot-id : opsd_bovin SRS syntax for query by ID • embl-acc : x13776 SRS syntax for query by accession format :: filename • fasta :: /users/pmr/paamir.fa Filename with specific format • ecoompa.genbank With no format, can try all formats format :: filename : entryname • fasta :: unfinished : AH6.1 Most formats allow multiple sequences Also @listfile and asis::gctgactgactgatg Queries database-field:query SRS syntax for id, acc, sv, des, key, org EMBOSS as a DAS Client

  10. New data resources • Aim to read “all” public data resources • Follow cross-references (explicit and implied) • UniProt • EMBL/GenBank/DDBJ • Other • Servers • Multiple data resources through a single server definition • DAS, Ensembl, BioMart, WsEbeye, DbFetch, SRS • Cache files of resource definitions for server • Data resource catalogue (drcat) • 600+ data resources • Query terms and URLs • EDAM annotation of resources, formats, identifiers, terms EMBOSS as a DAS Client

  11. Data resource catalogue (drcat) ID ArachnoServer Acc DB-0145 Name ArachnoServer Desc Spider toxin database URL http://www.arachnoserver.org Cat Organism-specific databases Taxon 6845 | Arachnida EDAMres 0000621 | Organism-specific EDAMdat 0002400 | Toxin annotation EDAMid 0002578 | ArachnoServer ID Xref SP_explicit | ArachnoServer ID;Toxin name Query Toxin annotation | HTML | ArachnoServer ID | www.arachnoserver.org/toxincard.html?id=%s Example ArachnoServer ID | AS000014 CCmisc BMC Genomics 10:375-375(2009); [Pubmed: 19674480] EMBOSS as a DAS Client

  12. EMBOSS Data Types • Sequences • Nucleotide (DNA and RNA) • Protein • Features • Attached to sequences • Independent data objects • Bio-Ontologies (OBO) • Taxonomy (NCBI) • Data Resources • Assembled reads • Text • Text, HTML, XML EMBOSS Datatypes

  13. New data types • Reuse “USA” syntax • [Server:] Dbname : identifier Database has an access method • [Server:] Dbname – field : query General field names • Data types: features, bio-ontologies, taxonomy, etc. • Access methods: HTTP, DAS, BioMart, Ensembl, ... • Multiple types and formats for a server/resource • type: “sequence features” • format: “embl fasta” EMBOSS as a DAS Client

  14. EMBOSS Query Language • Query fields are now made general • Any field queriable by the access method (DAS, SRS, …) • Any index created by indexing applications • Any query term in the data resource catalogue • Multiple queries combined • For one data resource • AND, OR, … to combine queries EMBOSS as a DAS Client

  15. DAS Server Definitions SERVER das [ method: "dassource" type: "sequence, features" url: "http://www.dasregistry.org/das/" comment: "access sequence/feature sources listed on das registry (http://www.dasregistry.org/das/)" cachefile: "server.dassource" ] EMBOSS as a DAS Client

  16. DAS Server Definitions SERVER ensembldas [ method: "dassource" type: "sequence, features" url: "http://www.ensembl.org/das/" comment: "access sequence/feature sources on ensembl das server (http://www.ensembl.org/das/)" cachefile: "server.ensembldas" ] EMBOSS as a DAS Client

  17. DAS Example DB Ensembl_Human_Genes [ method: das type: "Sequence, Features“ taxon: "9606“ format: "das, dasgff“ url: http://www.ebi.ac.uk/das-srv/genedas/das/ Homo_sapiens.Gene_ID.reference example: "ENSG00000139618“ comment: "The Ensembl human Gene_ID reference source, serving sequences and non-location features.“ hasaccession: "N“ identifier: "segment“ fields: "segment, type, category, categorize, feature_id“ ] EMBOSS as a DAS Client

  18. Ensembl DAS Example DB Felis_catus_CAT_prediction_transcript [ method: das type: "Nucfeatures“ taxon: "9685“ format: "dasgff“ url: http://www.ensembl.org/das/Felis_catus.CAT.prediction_transcript example: "scaffold_209987[1:550]“ comment: "Annotation source for Felis_catus prediction_transcript“ hasaccession: "N“ identifier: "segment“ fields: "segment, type, category, categorize, feature_id“ ] EMBOSS as a DAS Client

  19. EMBOSS Query Language • das: ensembl_human_genes: ENSG00000139618 • ensembldas: Felis_catus_CAT_prediction_transcript: scaffold_209987 [1:550] • das: Homo_sapiens_GRCh37_transcript: 10 [32889611:32973347] • das: uniprot: P00280 • das: cath: 5pti • das: uniparc: UPI000000000A • das: Homo_sapiens_GRCh37_reference- {segment: 11 & type: supercontig} EMBOSS as a DAS Client

  20. EMBOSS Query Language: Future • Ontology-based searches of data resources • Taxonomy • EDAM terms • Resources • Data types • Identifiers • Descriptions • Search for applications matching data types • Sequences and features • Nucleotide and protein • … • Support for DAS advanced query ... EMBOSS as a DAS Client

  21. Acknowledgements • EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam • RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop • Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley • LION: Mahmut Uludag, Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold • National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina • Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux, Ivo Hofacker, ... • IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, LION bioscience, SciTegic, Cambridge University Press • Open-Bio Foundation, Sourceforge, Debian, Fedora, CEH ... And the British Antarctic Survey http://emboss.sourceforge.net http://emboss.open-bio.org/wiki/Latest_developments EMBOSS as a DAS Client

More Related