1 / 17

Sequence Alignments

Sequence Alignments. Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu. Sequence Alignments. Cornerstone of bioinformatics What is a sequence? Nucleotide sequence Amino acid sequence

liona
Télécharger la présentation

Sequence Alignments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Alignments Chi-Cheng Lin, Ph.D.Associate ProfessorDepartment of Computer ScienceWinona State University – Rochester Centerclin@winona.edu

  2. Sequence Alignments • Cornerstone of bioinformatics • What is a sequence? • Nucleotide sequence • Amino acid sequence • Pairwise and multiple sequence alignments • What alignments can help • Determine function of a newly discovered gene sequence • Determine evolutionary relationships among genes, proteins, and species • Predict structure and function of protein

  3. Why Align Sequences? • The draft human genome is available • Automated gene finding is possible • Gene: AGTACGTATCGTATAGCGTAA • What does it do? • One approach: Is there a similar gene in another species? • Align sequences with known genes • Find the gene with the “best” match

  4. Visualization of Sequence Alignment • Dot Plot • One of the simplest and oldest methods for sequence alignment • Visualization of regions of similarity • Assign one sequence on the horizontal axis • Assign the other on the vertical axis • Place dots on the space of matches • Diagonal lines means adjacent regions of identity

  5. A Simple Example • Construct a simple dot plot for TAGTCGATGTGGTCATC • The alignment is TAGTCGATGTGGTC-ATC

  6. Genes Accumulate Mutations over Time • Mistakes in gene replication or repair • Deletions, duplications • Insertions, inversions • Translocations • Point mutations • Environmental factors • Radiation • Oxidation

  7. Deletions • Codon deletion:ACG ATA GCG TAT GTA TAG CCG… • Effect depends on the protein, position, etc. • Almost always deleterious • Sometimes lethal • Frame shift mutation:ACG ATA GCG TAT GTA TAG CCG…ACG ATA GCG ATG TAT AGC CG?… • Almost always lethal

  8. Indels • Comparing two genes it is generally impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known:ACGTCTGATACGCCGTATCGTCTATCTACGTCTGAT---CCGTATCGTCTATCT

  9. The Genetic Code Substitutions are mutations accepted by natural selection. Synonymous: CGC CGA Non-synonymous: GAU  GAA

  10. Wild-type hemoglobin DNA 3’----CTT----5’ mRNA 5’----GAA----3’ Normal hemoglobin ------[Glu]------ Mutant hemoglobin DNA 3’----CAT----5’ mRNA 5’----GUA----3’ Mutant hemoglobin ------[Val]------ Point Mutation Example: Sickle-cell Disease

  11. image credit: U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.

  12. Comparing Two Sequences • Point mutations, easy:ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT • Indels are difficult, must align sequences:ACGTCTGATACGCCGTATAGTCTATCTCTGATTCGCATCGTCTATCTACGTCTGATACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT

  13. Scoring a Sequence Alignment • Example • Match score: +1 • Mismatch score: +0 • Gap penalty: –1 ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || ||||||||----CTGATTCGC---ATCGTCTATCT • Matches: 18 × (+1) • Mismatches: 2 × 0 • Gaps: 7 × (– 1) • Various scoring scheme exist. Score = 18 + 0 + (-7) = +11

  14. How can we find an optimal alignment? • Finding the alignment is computationally hard:ACGTCTGATACGCCGTATAGTCTATCTCTGAT---TCG-CATCGTC--T-ATCT • There are ~888,000 possibilities to align the two sequences given above. • Algorithms using a technique called “dynamic programming” are used – out of the scope of this workshop.

  15. Global and Local Alignments • Global alignments – score the entire alignment • Local alignment – find the best matching subsequence • Why local sequence alignment? • Global alignment is useful only if the sequences to be aligned are very similar • Subsequence comparison between a DNA sequence and a genome • Identify • Conserved regions • Protein function domains

  16. Example • Compare the two sequences: TTGACACCCTCCCAATT ACCCCAGGCTTTACACAG • Global alignment (does it look good?) TTGACACCCTCC-CAATT || || || ACCCCAGGCTTTACACAG • Local alignment (does it look good?) ---------TTGACACCCTCCCAATT || |||| ACCCCAGGCTTTACACAG--------

  17. Where do we get sequences to work with? • Biological databases • NCBI Entrez (http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi?term=) • Wet labs • Simulations • Other people’s results • On-line education resources • BEDROCK (http://www.bioquest.org/bedrock/) • BLAST results

More Related