1 / 33

EMBOSS – an application suite for Bioinformatics

EMBOSS – an application suite for Bioinformatics. Shahid Manzoor Adnan Niazi. E – European M – Molecular B – Biology O – Open S – Software S - Suite. All Information. EMBOSS info at http://emboss.sourceforge.net/ . wEMBOSS info at http://wemboss.sourceforge.net/ .

ckarpinski
Télécharger la présentation

EMBOSS – an application suite for Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EMBOSS – an application suite for Bioinformatics • Shahid Manzoor • Adnan Niazi SLU Global Bioinformatics Centre

  2. E – European M – Molecular B – Biology O – Open S – Software S - Suite SLU Global Bioinformatics Centre

  3. All Information • EMBOSS info at http://emboss.sourceforge.net/. • wEMBOSS info at http://wemboss.sourceforge.net/. • E-mail martin.norling@slu.se to get a username and password for wEMBOSS at http://ebiokit.hgen.slu.se/. SLU Global Bioinformatics Centre

  4. What is EMBOSS • Open Source molecular biology analysis package. • Handles a variety of common file formats. • Provides libraries for easy development • Software, licensed under GPL and LGPL • Developed by Martin Sarachu and Marc Colet • Available at http://emboss.sourceforge.net SLU Global Bioinformatics Centre

  5. Features of EMBOSS • A comprehensive set of sequence analysis programs. • All sequence and many alignment and structural formats are Handled. • It runs on practically every UNIX you can think of (and likely some that you can't), plus Windows and OS X. • Each application has the same style of interface so master one and you've mastered them all. SLU Global Bioinformatics Centre

  6. Uses for EMBOSS • Sequence alignment. • Protein motif identification (including domain analysis) • Nucleotide sequence pattern analysis (for example to identify CpG islands or repeats). • Presentation tools for publications. SLU Global Bioinformatics Centre

  7. Programs in EMBOSS • Many small and large programs in package (>140). • All programs share a common look and feel. • Easy to run from command line. • Retrieval of sequence data from the web. SLU Global Bioinformatics Centre

  8. The one Argument • help • the –help argument displays a short help for any EMBOSS program. SLU Global Bioinformatics Centre

  9. The One Command • wossname • wossname searches the other programs short description for keywords. SLU Global Bioinformatics Centre

  10. Large collection of gene and protein analysis tools Translation Protein domain searching Sequence retrieval Alignments Primer design Restriction Mapping SLU Global Bioinformatics Centre

  11. DNA Sequence 1 DNA Sequence 2 protein Sequence 1 protein Sequence 2 translation dotplot protein local/global alignment multiple sequence alignment motif and domain searching physico-chemical properties SLU Global Bioinformatics Centre

  12. >SEQ1.fasta >SEQ2.fasta AGTGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCTTAA AGTGCTCCTCCCTTAGAATCTTAG Dotplots For an exact match: Unix% dottup SEQ1.fasta SEQ2.fasta –window 10 & For a similarity match: Unix% dotmatcher SEQ1.fasta SEQ2.fasta –window 10 –threshold 17 & SLU Global Bioinformatics Centre

  13. A T G C A 5 -4 -4 -4 T -4 5 -4 -4 G –4 -4 5 -4 C -4 -4 -4 5 Dotplots … Window Size is number of bases in a sliding window that is moved along each sequence and compared to generate a single data point on the plot. Window size must be an odd number. Identity Matrix Mismatch Limit determines how similar the two sequences in a window must be to "match". For example, if window size is 9 and mismatch limit is 2, then up to 2 mismatches in a 9 base window will still be classified as a match. SLU Global Bioinformatics Centre

  14. Dotplots … 5 5 5 5 5 5 5 5 5 5 A T G C A 5 -4 -4 -4 T -4 5 -4 -4 G –4 -4 5 -4 C -4 -4 -4 5 Pro Leu 5 5 5 5 5 5 -4 5 5 -4 Pro Leu CCTCCTTTGG Score = 50 CCTCCTTTGG CCTCCTTTGG Score = 32 CCTCCCTTAG SLU Global Bioinformatics Centre

  15. Dotplots • A dot plot is a simple graphical representation of identical residues between two sequences. • The X axis represents the first sequence (PHO5), • The Y axis represents the second sequence (PHO3) • A dot is plotted for each match between two residues of the sequences. • Diagonal lines reveal regions of identity between the two sequences. SLU Global Bioinformatics Centre

  16. Dotplots … • The dot plot can be adapted to display only word matches, which correspond to a diagonal of dots in the letter-based dot plot. • Example: alignment of PHO5 and PHO3 coding sequences, with different word sizes. SLU Global Bioinformatics Centre

  17. Detecting repeats with a dot plot • Sequence repeats are easily detected in a dot plot when a sequence is compared to itself. • The main diagonal is completely marked • (by definition, since the sequence is identical do itself) • Repeats appear as segments of lines parallel to the diagonal. SLU Global Bioinformatics Centre

  18. >SEQ1.fasta >SEQ2.fasta ATGGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCTTAA ATGGCTCCTCCCTTAGAATCTTAG Plotorf Unix% plotorf SEQ1.fasta –stop TAA, TAG –out GA.plot & Unix% getorf SEQ1.fasta –minsize 5 –table 0 –find 1 –out GA.getorf & SLU Global Bioinformatics Centre

  19. Frame -1 Frame -2 Frame -3 Frame 3 Frame 2 Frame 1 ATGGGTCGTGAAGAGAATGCTCCTCCTTTGGAATCTTAA Start and stop codons are located according to the instructions to the program, and the area in between start and stop codons TACCCAGCACTTCTCTTACGAGGAGGAAACCTTAGAATT SLU Global Bioinformatics Centre

  20. Indication of full coding sequence? Alternative splice form? SLU Global Bioinformatics Centre

  21. Using getorf: >_1 [17 - 37] MLLLWNL >_2 [1 - 36] MGREENAPPLES* start methionine stop codon SLU Global Bioinformatics Centre

  22. >GA.fasta GREENAPPLES Unix% transeq SEQ1.fasta –frame 1 –table 0 –sbegin 4 –send 33 -out GA.fasta & SLU Global Bioinformatics Centre

  23. >GA.fasta >A.fasta GREENAPPLES APPLES Alignments For a global alignment: Unix% needle GA.fasta A.fasta –gapopen 10 –gapextend 0.5 –matrix EPAM250 & For a local alignment: Unix% water GA.fasta A.fasta –gapopen 10 –gapextend 0.5 –matrix EPAM250 & SLU Global Bioinformatics Centre

  24. APPLES GREENAPPLES APPLES APPLES APPLES Alignments … To align two or more sequences in a biologically significant way. GREENAPPLES Gap penalty = 10; Extension penalty = 0.5 Local (water) Global (needle) SLU Global Bioinformatics Centre

  25. APPLES pattern searching physicochemical properties GREENAPPLES looks like the “apples” motif may be part of a larger domain APPLES SLU Global Bioinformatics Centre

  26. Physico-chemical properties Isoelectric point Unix% iep GA.fasta –plot -step 0.5 –out GA.IEP & General properties Unix% pepinfo GA.fasta –hwindow 8 –generalplot –hydropathyplot & SLU Global Bioinformatics Centre

  27. Polar Positive Small Charged Tiny Hydrophobic Aromatic P G A Aliphatic S C V I N T D L Q E M K Y R H F W Physico-chemical properties The pepinfo graph of properties is based on this diagram SLU Global Bioinformatics Centre

  28. Physico-chemical properties non-polar region with small residues polar region to one side of non-charged region SLU Global Bioinformatics Centre

  29. Pattern searching >GL.fasta GREENLEAVES >GA.fasta GREENAPPLES >RL.fasta REDLEAVES >RA.fasta REDAPPLES GREENAPPL---ES -RE-DAPPL---ES GREEN---LEAVES -RE-D---LEAVES [G] (0,1)-R–[E] (1,2)–[ND]–X (3)–L–X (3) – E – S SLU Global Bioinformatics Centre

  30. pattern.fruit [G] (0,1) - [R] – [E] (1,2) – [ND] –x (3) – [L] –x (3) – [E] – [S] Pattern searching Search a protein database: Unix% fuzzpro sptr:* pattern.fruit –mismatch 0 –out GA.fuzzpro & Nothing resembling this pattern is found in the database - But we could try scanning PRINTS (pscan) and PROSTIE (patmatmotifs) with one of our sequences. SLU Global Bioinformatics Centre

  31. Some Programs SLU Global Bioinformatics Centre

  32. Some Programs … SLU Global Bioinformatics Centre

  33. More Information SLU Global Bioinformatics Centre

More Related