1 / 38

Basic terms:

Basic terms:. Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Identity percentage Homology -specific term indicating relationship by evolution. Basic terms:.

shona
Télécharger la présentation

Basic terms:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic terms: • Similarity - measurable quantity. • Similarity- applied to proteins using concept of conservative substitutions • Identity • percentage • Homology-specific term indicating relationship by evolution

  2. Basic terms: • Orthologs: homologous sequences found in two or more species, that have the same function (i.e. alpha- hemoglobin).

  3. Basic terms: • Orthologs: homologous sequences found it two or more species, that have the same function (i.e. alpha- hemoglobin). • Paralogs: homologous sequences found in the same species that arose by gene duplication. ( alpha and beta hemoglobin).

  4. Pairwise comparison • Dotplot • All against all comparison. • Every position is compared with every other position.

  5. Pairwise comparison • Dotplot • All against all comparison. • Every position is compared with every other position. • Nucleic acids and proteins have polarity.

  6. Pairwise comparison • Dotplot • All against all comparison. • Every position is compared with every other position. • Nucleic acids and proteins have polarity. • Typically only one direction makes biological sense.

  7. Pairwise comparison • Dotplot • All against all comparison. • Every position is compared with every other position. • Nucleic acids and proteins have polarity. • Typically only one direction makes biological sense. • 5’ to 3’ or amino terminus to carboxyl terminus.

  8. Simple plot • Window: size of sequence block used for comparison. In previous example: • window = 1 • Stringency = Number of matches required to score positive. In previous example: • stringency = 1 (required exact match)

  9. DotPlot WINDOW = 4; STRINGENCY = 2 GATCGTACCATGGAATCGTCCAGATCA GATC + (4/4) GATC - (0/4) GATC - (0/4) GATC + (2/4)

  10. Dot Plot • Compare two sequences in every register. • Vary size of window and stringency depending upon sequences being compared. • For nucleotide sequences typically start with window = 21; stringency = 14 • Protein - start with smaller window : 3, stringency 1 or 2. • Important to test different stringencies.

  11. Intergenic comparison • Nucleotide sequence contains three domains. • 50 - 350 - Strong conservation • Indel places comparison out of register • 450 - 1300 - Slightly weaker conservation • 1300 - 2400 - Strong conservation

  12. Scoring Alignments • Quality Score: • Score x for match, -y for mismatch;

  13. Scoring Alignments • Quality Score: • Score x for match, -y for mismatch; • Penalty for: • Creating Gap • Extending a gap

  14. Scoring Alignments • Quality Score: • Quality = [10(match)]

  15. Scoring Alignments • Quality Score: • Quality = [10(match)] + [-1(mismatch)]

  16. Scoring Alignments • Quality Score: • Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps)

  17. Scoring Alignments • Quality Score: • Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model--

  18. Scoring Alignments • Quality Score: • Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved

  19. Scoring Alignments • Quality Score: • Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved Mismatches are divergences

  20. Scoring Alignments • Quality Score: • Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved Mismatches are divergences Gaps are more likely to disrupt function, hence greater penalty than mismatch.

  21. Scoring Alignments • Quality Score: • Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved Mismatches are divergences Gaps are more likely to disrupt function, hence greater penalty than mismatch. Introduction of a gap (indel) penalized more than extension of a gap.

  22. Z Score (standardized score) • Z = (Scorealignment - Average Scorerandom) Standard Deviationrandom

  23. Quality Score:Randomization • Program takes sequence and randomizes it X times (user select). • Determines average quality score and standard deviation with randomized sequences • Compare randomized scores with Quality score to help determine if alignment is potentially significant.

  24. Randomization • It has become clear that • Sequences appear to evolve in a “word” like fashion. • 26 letters of the alphabet--combined to make words. • Words actually communicate information. • Randomization should actually occur at the level of strings of nucleotides (2-4).

  25. Global Alignment • Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps.

  26. Global Alignment • Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps. • Alignment will “run” from one end of the longest sequence, to the other end.

  27. Global Alignment • Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps. • Alignment will “run” from one end of the longest sequence, to the other end. • Best for closely related sequences.

  28. Global Alignment • Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps. • Alignment will “run” from one end of the longest sequence, to the other end. • Best for closely related sequences. • Can miss short regions of strongly conserved sequence.

  29. Local Alignment • Identifies segments of alignment with the highest possible score.

  30. Local Alignment • Identifies segments of alignment with the highest possible score. • Align sequences, extends aligned regions in both directions until score falls to zero.

  31. Local Alignment • Identifies segments of alignment with the highest possible score. • Align sequences, extends aligned regions in both directions until score falls to zero. • Best for comparing sequences whose relationship is unknown.

  32. Global Alignment: Local Alignment:

  33. Blast 2 Basic Local Alignment Search Tool E (expect) value: number of hits expected by random chance in a database of same size. Larger numerical value = lower significance HIV sequence

  34. Both Global and Local alignment programs will (almost) always give a match.

  35. Both Global and Local alignment programs will (almost) always give a match. • It is important to determine if the match is biologically relevant.

  36. Both Global and Local alignment programs will (almost) always give a match. • It is important to determine if the match is biologically relevant. • Not necessarily relevant: Low complexity regions. • Sequence repeats (glutamine runs)

  37. Both Global and Local alignment programs will (almost) always give a match. • It is important to determine if the match is biologically relevant. • Not necessarily relevant: Low complexity regions. • Sequence repeats (glutamine runs) • Transmembrane regions (high in hydrophobes)

  38. Both Global and Local alignment programs will (almost) always give a match. • It is important to determine if the match is biologically relevant. • Not necessarily relevant: Low complexity regions. • Sequence repeats (glutamine runs) • Transmembrane regions (high in hydrophobes) • If working with coding regions, you are typically better off comparing proteinsequences. Greater information content.

More Related