1 / 70

Blast2GO presentation @ StatSeq COST workshop

Blast2GO presentation @ StatSeq COST workshop. 21 nd -23 rd April 2013, Helsinki, Finland. Friday 25 th January 2013, Royal Melbourne Hospital. Why Blast2GO. Functional characterization of novel sequence data. Adapted of high throughput needs of biological laboratories.

ayala
Télécharger la présentation

Blast2GO presentation @ StatSeq COST workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Blast2GO presentation @ StatSeq COST workshop 21nd -23rdApril2013, Helsinki, Finland Friday 25th January 2013, Royal Melbourne Hospital

  2. Why Blast2GO Functional characterization of novel sequence data Adapted of high throughput needs of biological laboratories Extracting knowledge about functioning of genomes

  3. Blast2GO Impact

  4. Outline • Concepts on Functional Annotation • The Blast2GO annotation framework • Visualization of functional data • Pathway analysis with Blast2GO

  5. Concepts of Functional Annotation What is functional annotation? How to annotate a large dataset?

  6. The Gene Ontology • Threebranches: • BiologicalProcess • Molecular Function • CellularComponent • Annotations are givento te mostspecific(low) level • Truepathrule: annotation at a giventermimpliesannotationtoallitsparentterms • AnnotationisgivenwithanEvidenceCode: • IDA: inferred by directassay • TAS: traceableauthorstatement • ISS: infered by sequencesimilarity • IEA: electronicannotation • …. More general More specific

  7. Functionalassignment Annotation Empirical Transference Literature reference Phylogeny Molecular interactions Biochemical assay Sequence analysis Structure Comparison Sequence homology Gene/protein expression Identification of folds Motif identification

  8. Annotationbysimilarity: concerns GO1, GO2, GO3, GO4 HIT QUERY GO1, GO2, GO3, GO4 Level of homology (~ from 40-60% ispossible) Theoverlapbetween hit and query, associationfunction and structure Theparalogproblem: genes with similar sequences mighthavedifferentfunctionalspecifications Theevidenceforthe original annotation Balancebetweenquality and quantity: dependsonthe use

  9. The Blast2GO annotation framework

  10. Application scheme cellular component biological process Fasta

  11. Application scheme biological process cellular component Fasta

  12. Application scheme cellular component biological process Fasta

  13. Basic annotation procedure Hit1 Hit1 go1,go2, go3 go1,go3, go4 Hit2 Hit2 go3,go5, go6,go8 Hit3 Hit3 go1,go4 Hit4 Hit4 Hit1 Hit1 go6,go9, go8 go6,go9,go8 go1,go8 go1,go8 Hit2 Hit2 go4,go1, go8,go9 go4,go1,go8,go9 Hit3 Hit3 Hit4 Hit4 Hit1 Hit1 go2 go2,go4, go4 Hit2 Hit2 go2,go5, go6 Hit3 Hit3 go2,go4 Hit4 Hit4 Hit1 Hit1 Hit2 Hit2 go1,go2, go3 go1,go3, go4 Sq1 Sq1 Sq1 Sq1 go3,go5, go6,go8 go1,go4 Sq2 Sq2 Sq2 Sq2 Blast Mapping Annotation go2 go2,go4, go4 Sq3 Sq3 Sq3 Sq3 go2,go5, go6 go2,go4 Sq4 Sq4 Sq4 Sq4

  14. Annotation Rule • Let be GO1…n be candidateannotationsforsequence S1, obtainedfrom hits Hi…k • We compute anannotationscore AS foreachGOithatdependson: • Thesimilaritybetweensequence S1andHj • TheevidencecodeofGOi • Theexistenceofotherneigboring GO candidates • Thestructureofthe Gene Ontology • We define anabritaryannotationthreshold(AT) • S1isannotatedwithGOiifitsASGOi > AT

  15. Annotation Rule Possibility of abstraction Similarity Requirement GO4 GO1 GO2 GO3 Quality of source annotation: IEA=0.7, IDA = 1, NR = 0.0, ... Annotation Score selectivity vs. specificity Cut-Off Value new annotation True-Path-Rule

  16. Blast2GO annotation rule - When I have a GO withECw =1and I do notallowabstraction (GOw = 0), thentheAnnotationScore = %similarity - IftheECw< 1 my similarity requirement is higher to obtain the same Annotation Score - If I allow abstraction GOw > 0, then with less similarity I can obtain the required Annotation Score at a parent node

  17. www.blast2go.com

  18. Start Blast2GO

  19. Blast2GO Application (1) Blast (2) Mapping (3) Annotation Main Sequence Table Any operation will only affect to selected sequences!!!! Application statistics Blast results Application messages Graph visualisation

  20. Load sequences

  21. Input data (in FASTA format, AA or nt)‏ >my_favourite_species_seq1 | still unknown gtgatggaaaagaaaagttttgttatcgtcgacgcatatgggtttctttttcgcgcgtattatgcgctgcctggattaagcacctcatacaattttcctgtaggaggtgtatatggttttataaacatacttttgaaacatctctctttccacgatgcagattatttagttgtggtatttgattcggggtcgaaaaattttcgtcacactatgtattccgaatacaaaactaatcgccctaaagcaccagaggatctgtcactacaatgtgctccgctacgtgaggctgttgaagcgtttaatattgtaagtgaagaagtgcttaactacgaagcagacgacgtaatagctacactctgtacaaaatatgcatctagtaatgttggagtgagaatactgtcagcagataaggatttactacaactcctaaatgataatgttcaagtttacgaccctataaaaagcagatacctcaccaatgaatacgttttagaaaaatttggtgtttcatcagataagttgcatattgatacggttgcatcgagttataatgagaaaattattctcagctaagctgtacaccgtttattacacactcgaaaggccgttag >my_favourite_species_seq2 | no clue ttgttagctaaaaaggaagactttcacacctttggtaatggtgttggctctgctggaacaggtggagttgtagtttctgcatccatgttgtctgcggatttttcaaatcttagagaagagatagcagcggttagtacggctggtgcagattggttacacattgatgtgatggatgggtgcttcgtccccagtttgactatgggtcctgtggtgatttccggcattaggaaatgtacaaatatgtttcttgatgtgcatttgatgattaatcgcccaggcgatcatctgaagagtgtggtagatgctggagctgataagatagagcacattcgcaagatgatagaggaaagctcatcaaccgcgaaaatcgctgttgatggtggtgtttcaacggataatgcccgggctgttatcgaggcaggtgcgaatatactcgttgttggaacggcgctgtttgctgctgacgatatgagtaaagttgtaagaactttaaaatcattttaa >my_favourite_species_seq3 | just sequenced gtgggactgctcatccctgtaggcagggtggctattttttgtgtaaaggcagtctttcatagtcttgtaccgccatactatctatggataactacaaagcagttttttgaggtgtggtttttctctcttcctatagtagcagttacatctttgtttacgggaggcgcgttagcccttcaggataccctcgtgggaagcgctaaagtatcagggtaatggagtttttactcctgcaagatgtaatagagggtctggtaaaagctgtatcgtttgggctggtaatttcgctagttgggtgttacaacgggtatcactgtgagataggcgcaaggggtgtaggaacagcgacaacaaaaacttcggtagcagcttctatgctcataattttgttaaactatataattactgttttttacgcgta >my_favourite_species_seq4 | we will see soon... atgtacgctgtatctctttcaaatttgcatgtctctttcaacaacaaggaggttttgaaaggtgttgacttggacatagcatggggggattccctggttatactgggagaatctggtagtggaaagtctgtactaacaaaggttgtattgggtctaatagtgccccaagagggaagtgttactgtagatggcaccaatattcttgagaataggcagggcatcaagaattttagtgttttgtttcaaaactgtgcgttatttgacagtcttacgatttgggaaaatgtagtattcaatttccgtaggaggcttcgtttagataaggataatgccaaggctttggctttacggggattggagcttgtgggattggacgccagtgtaatgaacgtgtatcctgtggagctatcaggcgggatgaaaaagcgcgtagctttggcaagagctattataggtagtcccaaaattctaattttggatgagccaacttcgggattggatcctataatgtcttcagtggt asdf asdf

  22. BLAST You email adress BLAST program (normally blastx) BLAST database (many options) E-Value (depends on the DB) Number of HITs (use <= 20) Recommended to save as XML Human readable seq. Descriptions via BDA

  23. Additional BLAST params Set word size and filter Use your own server Minimum HSP length Filter by description Parsing options for own databases

  24. BLAST Results RED

  25. Blast Distribution Charts Evaluate the similarity of your sequences with public DBs

  26. Single Sequence Menu Single Sequence Menu

  27. Mapping Results GREEN

  28. Annotation Menu BLAST based annotation Other Annotation modes Validation and Annex

  29. Annotation Allows to set a minimum percentage of the HIT sequence which should be expand by the QUERY sequence This helps to avoid the problem of cis-annotation

  30. Annotation Result BLUE

  31. Annotation Charts

  32. Annotation Charts Commonly, level 5 is the most abundant specificity level in the Gene Ontology

  33. Additional Annotation: ANNEX Recovers implicit biological process and cellular component GO terms based on molecular function annotations Molecular Function acts in is involved in Biological Process Cellular Component Myhre et al, Bioinformatics 2006

  34. Additional Annotation: InterProScan Runs InterProScan searches at the EBI through Blast2GO Results are stored at your computer as XML files. You can upload them later Once you have completed your InterPro annotation, results can be transformed to GO terms and merged to Blast annotation

  35. InterProScan Results Column with InterProScan results

  36. Additional Annotation: GOSlim GOSlim is a reduction of the Gene Ontology to a more reduced vocabulary → Helps to summarize information After GOSlim transformation sequences get YELLOW Different GOSlims available at Blast2GO

  37. Enzyme annotation and Kegg Maps GO  Enzyme Codes  KEGG maps

  38. Manual Curation You can modify manually annotation of particular sequences If you click in this box, curated sequences get purple

  39. Export Results Saves the complete B2G project (heavy) Export annotation results in different formats

  40. Export formats Also for import! .annot GeneSpring Format GoStat By Seq

  41. More export formats Export Sequence Table Export BestHit Data

  42. Sequence Selection Sequence Selection tool to obtain a selection based on annotation status

  43. Sequence Selection By Function By Name/Description

  44. View Menu Functions to switch between displaying IDs or descriptions for GO annotation or InterPro results

  45. Hands-on I Annotation 10 seqs with Blast2GO

  46. Visualization How to understand the functional context of a annotated dataset

  47. Combined Graph Each term has a number of sequences associated Nodes can be coloured to indicate relevance Each term is displayed around its biological context Node shape to differentiate between direct and indirect annotation

  48. Combined Graph Different GO branches Reduces nodes by number of annotate sequences Node data to be displayed Criterion for highlighting and filtering nodes

  49. Accumulated by GO term (SequenceCount) 5 1 4 1 3 1 3 2.5 1 2.4 1 3 1 3 Node information content Σ seq(g)*αdist (g, g') g∈desc(g') Incomming information (Node Score)

  50. Compacting Graphs by GO-Slim

More Related