Folding and Misfolding Initiation Sites in Proteins By Chris Bystroff Rensselaer Polytechnic Institute Troy, New York, USA
Intermediates are not observed, but Folding is 2-state Unfolded Folded
folding pathways must exist something happens first...
Protein Misfolding diseases Alzheimer's disease Creutzfeldt-Jakob Disease (CJD)* Scrapie* Kuru* Huntington's Disease Parkinson's Disease Type-2 diabetes Familial Amyloid Polyneuropathy (FAP) *Prion-linked
Misfolding Diseases are Diseases of the Folding Pathway • Predicting protein structure using homology may be easier and faster, BUT • You can’t predict misfolding using homology. • You CAN predict misfolding by modeling the folding pathway.
Early folding events might be recorded in the database Short, recurrent sequence patterns could be folding Initiation sites recurrent part HDFPIEGGDSPMQTIFFWSNANAKLSHGY CPYDNIWMQTIFFNQSAAVYSVLHLIFLT IDMNPQGSIEMQTIFFGYAESAELSPVVNFLEEMQTIFFISGFTQTANSD INWGSMQTIFFEEWQLMNVMDKIPSIFNESKKKGIAMQTIFFILSGR PPPMQTIFFVIVNYNESKHALWCSVD PWMWNLMQTIFFISQQVIEIPSMQTIFFVFSHDEQMKLKGLKGA Non-homologous proteins Nature has selected for these patterns because they speed folding.
I-sites ( ) w s aa d = å k kj i k seqs = P = ij w å k k seqs = Sequence Profiles Sequence alignment VIVAANRSA VIVSAARTA VIASAVRTA VIVDAGRSA VIASGVRTA VIVAAKRTA VIVSAVRTP Sequence profile VIVSAARTA VIVSAVRTP aa VIVDAGRTA VIVDAGRTA VIVSGARTP ••• ••• VIVDFGRTP VIVSATRTP VIVSATRTP VIVGALRTP VIVSATRTP VIVSATRTP VIASAARTA VIVDAIRTP Red = high prob ratio (LLR>1)Green = background prob ratio (LLR≈0)Blue = low prob ratio (LLR<-1) VIVAAYRTA VIVSAARTP VIVDAIRTP VIVSAVRTA VIVAAHRTA
I-sites Clustering sequence profiles | P P | ijl ikl i 1 , 20 l 1 , L Each dot represents a short profile similarity metric (log-likelihood correlation) A profile is a matrix of log-likelihood ratios (LLRs)
I-sites Type-I hairpin diverging type-2 turn Serine hairpin Frayed helix alpha-alpha corner glycine helix N-cap Proline helix C-cap Results: The I-sites Library Backbone angles: y=green, f=red Amino acids arranged from non-polar to polar
I-sites Why do I-sites exist? 1. Ancient conserved regions? 2. Folding initiation sites?
I-sites Patterns of conservation suggest independent folding 2. sidechain contacts 1. backbone angle constraints 3. negative design
I-sites NMR structures confirm independent folding diverging turn motif NMR structure of a 7-residue I-sites motif in isolation (Yi et al, J. Mol. Biol, 1998)
Peptide simulations support the folding initiation sites theory AAALDRMR AALEALLR AANRSHMP AARYKFIE ADFKAAVA AFDGETEI AKELVVVY AKGVETAD ARFTKRLG ATLEEKLN CNGGHWIA DAVTRYWP DEAIDAYI DELTRHIR DYVRSKIA EDLVERLK EELKQALR EEMVSKLK EKLLESLE EKPFGTSY EQIKAAVK FHMYFMLR FSVMNDAS FYSSYVYL GQLMALKQ HNLIEAFE IEHTLNEK IQNGDWTF KAAIAQLR KKYRPETD KNPDNVVG KPMGPLLV KQAHPDLK KQDKHYGY KSYLRSLR LDLHQTYL NAVWAAIK NETHSGRK NFLEVGEY NPVKESRH PAIISAAE PLQHHNLL PRDANTSH QDDARKLM QGIIDKLD QKMKTYFN QTLAQLSV RDFEERMN RIILDRHR RLLLKAYR RPIARMLS RVLGRDLF SCDVKFPI TEVMKRLV TLNEKRIL YASLRSLV YESHVGCR • Initially extended.• 800-900 waters added.• Ions added (Na, Cl)• 340°K Sequence simulateds
I-sites There is a correlation between I-sites sequence score and the simulations r=0.48 (all peptides)r=0.61 (trajectories > 20ns long)
HMMSTR Is there a motif “grammar”? Arrangement of I-sites motifs in proteins is highly non-random helix helix cap beta strand beta turn The dependencies can be modeled as a Markov chain
HMMSTR Finding the connectivity of I-sites motifs aligned profiles aligned structures HMM topology:
HMMSTR HMMSTR Hidden Markov Model for local protein STRucture Markov state pathways represent local structure motifs
Support Vector Machine based on HMMSTR SCOP benchmark of 54 sequence families 282-dimensional vector: Prob of each HMMSTR state. SVM-HMMSTR outperforms SVM-pairwise
What can HMMSTR tell us about misfolding? Can we predict misfolding, given the Xray structure? 1 2 3 HMMSTR secondary structure prediction Helix 3 is known to be the location of familial prion disease mutations. Human prionprotein fragment.(X-ray structure solved in 2002) 2 3 3 2 1 1 Knaus et al, NSB 8:770-4, 2001
Green Fluorescent Protein • Beta-barrel, with a central helix • Folds slowly (k=0.02s-1) • Aggregates in the cell if overexpressed without chaparonin. • Contains a misfolding initiation site: CFSRYPDH Proline helix C-cap
U=unfoldedM=misfoldedF=foldedA=aggregated F U A M corrrecly folded misfolded piece
Destabilizing the misfolding initiation site CFSLYPDH CFSRYPDH
HMMSTR-CM HMM State Contact Potential • G (p, q, s) represents the free energy of a motif-motif contact.
HMMSTR-CM Contact Maps Definition:
HMMSTR-CM Target Contact Potential Map • E (i, j) is the contact potential between every two residues i and j.
HMMSTR-CM Both axes: sequence Red: favorable contact Blue: unfavorable E(i,j)
HMMSTR-CM Features in a contact map can be interpreted as a TOPSdiagram helices strands
HMMSTR-CM Features in a contact map can be interpreted as a TOPSdiagram helices strands Which one is right?
HMMSTR-CM X non-polar amphipathic T0130 ab initio Prediction True contact map True Contact Map T0130 Contact energies
What does structure prediction tell us about the physics of folding? Check one: A. If we can predict protein structures, then we know how proteins fold. B. If we know how proteins fold, then we can predict protein structures.
HMMSTR:Chris Bystroff Vesteinn ThorssonDavid Baker Funding from:NSF, HHMI, RPI at NUS:Yuna HouMong-Li Lee Wynne Hsu Bystroff Lab:Yu Shao Donna CroneXin Yuan Rachel van DuyneKwang Kim www.bioinfo.rpi.edu/~bystrc/ HMMSTR says: Think Globally, Act Locally.
Prediction algorithms have Underlying principles Darwin = protein evolution. Principle: Proteins that evolved from common ancestor have the same fold. Algorithm: Detect homology.. Boltzmann = protein folding Principle: Proteins search conformational space, minimizing the free energy. Algorithm: Fold the protein.
2 4 5 9 6 3 8 1 7 2 1 10 3 11 4 Pathway based on nucleation sites and unfolding.
HMMSTR-CM Alternative (wrong) pathway. More polar More non-polar Given the assumptions, the pathway is almost unambiguous!!
HMMSTR-CM • A rule-based approach to fragment assembly • Uses the contact map representation • May be easily combined with templates
HMMSTR-CM Features in a contact map can be interpreted as a TOPSdiagram helices Alpha-alpha corner strands
HMMSTR-CM What are the rules for folding?
Assumptions: 1. We know how proteins unfold. 2. Folding is the reverse of unfolding.
HMMSTR-CM Unfolding a protein If we break the protein along the weakest fault lines....
HMMSTR-CM …the result are protein fragments of the size we can already predict accurately.
HMMSTR-CM Simplified representation
GFP is an 11-stranded anti-parallel beta barrel. The core of the protein contains non-polar (blue) and polar sidechains (red).
4 5 9 6 8 1 7 2 10 3 11 Are late folding peptides more antigenic? Experiment: Ab** raised against peptides for “early folding” loop (hairpin 5-6) and “late-folding” hairpin (10-11). Ab checked for binding to GFP. Early-folding peptide Late-folding peptide **Kindly provided by Dennis Metzger, AMC.
HMMSTR-CM If folding is the reverse of unfolding, what are the rules?
HMMSTR-CM 2. Right-handed+ non-polar pairing 1. more local 1. more local 3. Helix crowding
CASP5 Target T0157 HMMSTR-CM This is how the procedure is carried out on the contact map.