1 / 31

Protein Sequence

Protein Sequence. Amino Acid Composition IEC RP HPLC Ancient Sequencing methods Modern Sequencing methods Sequencing the Gene Then what?. Amino Acid Composition. 1952 - Complete Acid Hydrolysis Ion Exchange Chromatography with programmed buffer changes (~3 hr)

oki
Télécharger la présentation

Protein Sequence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Sequence • Amino Acid Composition • IEC • RP HPLC • Ancient Sequencing methods • Modern Sequencing methods • Sequencing the Gene • Then what?

  2. Amino Acid Composition • 1952 - Complete Acid Hydrolysis • Ion Exchange Chromatography with programmed buffer changes (~3 hr) • Post-column derivatization with • Ninhydrin • Fluorescamine • 1980 - Complete Acid Hydrolysis • Precolumn derivatization to Phenylthiohydantoins • Reversed-Phase HPLC (~30 min)

  3. Sequencing • Sanger Endgroup Analysis • Modify the protein with fluorodinitrobenzene (amines), aka FDNB, Sanger’s reagent. • Alternative reagent, dansyl chloride, fluorescent. • Hydrolyze protein • Separate by TLC • Identify N-terminal amino acid by Rf • Treat protein with Aminopeptidase • Repeat until the end gets ragged • Use proteolytic fragments for simplicity

  4. Sequencing • Generate proteolytic fragments • Use more than one protease in separate experiments • Trypsin cleaves after Arg and Lys residues • Chymotrypsin cleaves after Phe, Tyr, Trp • Separate fragments (HV paper electrophoresis/HPLC) • Sequence all peptides independently • Assemble the sequence using overlap info Trypsin Chtr

  5. Automated Sequencing • Use proteolytic fragments • Sequence each peptide using automated Edman Degradation • Each Edman cycle removes one amino acid • Converts it to PTH amino acid for HPLC • Assemble the sequence using overlap info Trypsin Chtr

  6. N-Terminal Edman Degradation Peptide Attack on Phenylisothiocyanate + H+ Rearrangement Analino- thiazolinone amino acid + PTH-amino acid Absorbs 260-275 nm RP-HPLC compatible Peptide N-1

  7. C-Terminal Edman Degradation - Activation of carboxyl by acetic anhydride Attack by thiocyanate Peptide N-1 +H2O - TH-amino acid Hydrolysis

  8. Alternative Sequencing - MS • Use non-fragmenting ionization • Electrospray Ionization + traditional mass Spec • Matrix-assisted laser desorption-ionization + time-of-flight mass spec (MALDI-TOF) • Measures mass of mature, intact protein and/or complexes

  9. Sequencing the Gene • DNA synthesis in vitro requires • Template (the DNA you want to sequence) • Primer (complementary to region up stream of where you want to sequence) • Polymerase • dXTP’s, Mg++ • Primer pairs with template, free 3’-OH group ready for action • As dXTP’s basepair with template, the 3’-OH attacks the a-phosphate of the dXTP, displacing PPi, making a phosphodiester, extending the nascent DNA chain by one base

  10. P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P The Polymerase Reaction Elongation of a primer that is base-paired with a template Requires a free 3’-0H group OH 5’ PP P T G C C G A T A T C G C G A T T A T A A T A T A T A C T A G A A T T C A 3’ 5’

  11. Di-deoxy Terminators • If 2’, 3’-dideoxy nucleoside triphosphates were used, the reaction would proceed for only one cycle because there would be no free 3’-OH group to attack the next dXTP • If a fraction of a percent of ONE 2’, 3’-dideoxy nucleoside triphosphate (say ddTTP) were used • SOME polymer would be terminated EACH time that base was incorporated, i.e., each time dA occurs in the template. • If 1/1000th of the dTTP were ddTTP, then 1/1000th of the polymers would terminate at each dA in the template… the rest would continue • You would get many polymers of different sizes, each corresponding to the occurrence of a dA in the template • Use four separate reactions, one with ddTTP, one with ddATP, one with ddGTP, and one with ddCTP (and all other components) • One of the reaction mixtures would contain a polymer that terminated at each base

  12. ddATP ddTTP ddCTP ddGTP Dideoxy Terminators Sequence of template Base in polymer • Use fluorescent or radioactive primer so you can see every polymer • Separate them by size (gel electrophoresis) • Read sequence of polymers from gel • Infer the sequence of the template by Watson-Crick 3’ A T G T C A C A G G A C A G A 5’ 5’ TACA G T C T C C T G T C T 3’ small large Agarose gel

  13. A, T, G, and C. What are the Amino Acids?Standard Genetic Code

  14. ORFs - Look for longest uninterrupted sequence

  15. So, you’ve got the sequence…So what? Next topic: Bioinformatics Inferences based on homology

  16. Questions • Has the gene been sequenced before? (Will I be able to publish?) • What is the sequence of the protein encoded by the gene? • Has the protein been sequenced before? • Is the gene similar to one that has been sequenced before? • Did I sequence the right gene? • Will I be able to find structural or functional relatives? • Is the protein similar to one that has been sequenced before? • How similar? • What does the similarity mean? • Can I predict the function of the gene product, or is the predicted function consistent with what I know about the protein? • Can I get information about structural features of the gene product? • Secondary structure • Folding domains or other common patterns • Hydropathy profiles • How might predicted helices and/or sheet pack? • Is it likely to be a membrane protein, a transmembrane protein?

  17. Answers: Sequence Similarities and Similarity Searches Search sequence databases for homologous proteins. Find families of proteins that are similar to your protein. Use information about the structure and properties of the similar protein(s) to establish inferences about your protein. If the exact sequence is in the database, the similarity search routines will find that, too. Determine whether two sequences are related (or identical) by aligning them so that homologous regions are adjacent. For two identical sequences: MGKARSMVLKHSTKARS MGKARSMVLKHSTKARS

  18. But, what about: Imperfect homology MGKARSMLLKHSTKARS MGKARTMVLKHSTRARS Gaps/insertions MGKARSMLLKHSLKARS MGRA LKHSLRART And, how homologous is homologous

  19. Need • Similarity scores for pairs amino acids • Method for dealing with gaps • Algorithms for comparing a sequence with a database • Ways to assess the degree of homology • Ways to link structural info with sequence info

  20. Dynamic Programming Needleman-Wunsch Algorithm Compares similarity of two proteins a & b at positions i & j: NWi,j = max(NWi-1, j-1 + s(aibj); NWi-1, j; +g;NWi, j-1 +g) NWi-1, j-1= running total s(aibj)= similarity between residue i of protein a and residue j of protein b g = gap penalty http://www.avatar.se/molbioinfo2001/dynprog/dynamic.html

  21. Fill a Matrix with all possibilities Simple example: s = 1,0 and g = 0

  22. Smith-Waterman • Always compare NW terms to zero so that it doesn’t get too small. NWi,j = max (NWi-1, j-1 + s(aibj); NWi-1, j; + g;NWi, j-1 + g; 0)

  23. BLAST & FASTA • FASTA - great, we won’t talk about it • much faster and more selective than SW, but less sensitive • Basic Local Alignment Search Tool • less selective and more sensitive than FASTA, • i.e., you may get more hits, but some of them may be wrong

  24. BLAST • Divide sequence into “words” of length W (eg. BLASTp, initial W = 3) • Compare all W-length words • Retain only pairs with similarity above a threshold,T • Call them High-Scoring Pairs • Increase W, repeat with HSPs • Keep going • remaining above a minimum similarity, • and compare to random probability (E)

  25. Scoring Matrices- Making similarity quantitative • Compare the actual frequency to the frequency expected by chance alone. • Probablilty that alanine appears at position x in a protein • = fraction of Ala in all proteins • pAla • Probability that one protein has Ala at position x, and another protein has Gly? • =pAlapGly • The frequency due to chance, alone.

  26. Similarity • qAla,Gly = ACTUAL frequency that Ala and Gly are at position x in two proteins (in your database) • Ri,j = qi,j/pipj • Score: Si,j = log2(Ri,j) = log2(qi,j/pipj) • “Log-Odds Scores” • Remember Chou & Fasman?

  27. PAM Matrices • Margaret Dayhoff assembled the Atlas of Protein Structure • Evolutionarily-accepted mutations • Calculated qi,j for all aa’s in closely-related proteins • These were accepted by Nature as similar/close enough • Generate half matrices: Point Accepted Mutation/Percent Accepted Mutations • Scale, so PAM1 reflects 1 mutation per 100 residues, PAM50, 50 allowed mutation/100

  28. BLOSUM • Henikoff and Henikoff • BLOcks of Amino Acid SUbstitution Matrix • BLOCKS is a database of related proteins

  29. BLAST Search • Go to BLAST Website • Enter Nucleotide or AA sequence • Choose BLAST type • Nucleotide-nucleotide; BLASTn • Protein-protein, BLASTp • 6-frame-translated nucleotide-Protein:BLASTx • others

  30. Then? • Does it make sense? • Multisequence Alignment • Secondary structure prediction • Domains • Families

  31. Caveat It ain't what you don't know that'll kill you, it's what you know that ain't so.

More Related