630 likes | 767 Vues
This presentation, led by Dr. Tamer Kahveci from the University of Florida, explores the intricate world of protein structures through bioinformatics. Proteins fold into complex three-dimensional shapes that are crucial for their function, with misfolding being linked to diseases such as sickle cell anemia and mad cow disease. This talk covers the primary, secondary, tertiary, and quaternary structures of proteins, their formation, and techniques for analyzing these structures, including X-ray crystallography and NMR. Understanding protein shapes is essential for drug design and identifying functional characteristics.
E N D
CAP5510 – BioinformaticsProtein Structures Tamer Kahveci CISE Department University of Florida
What and Why? • Proteins fold into a three dimensional shape • Structure can reveal functional information that we can not find from sequence • Misfolding proteins can cause diseases • Sickle cell anemia, mad cow disease • Used in drug design Hemoglobin Normal v.s. sickled blood cells E → V HIV protease inhibitor
Goals • Understand protein structures • Primary, secondary, tertiary • Learn how protein shapes are • determined • Predicted • Structure comparison (?)
A Protein Sequence >gi|22330039|ref|NP_683383.1| unknown protein; protein id: At1g45196.1 [Arabidopsis thaliana] MPSESSYKVHRPAKSGGSRRDSSPDSIIFTPESNLSLFSSASVSVDRCSSTSDAHDRDDSLISAWKEEFEVKKDDESQNL DSARSSFSVALRECQERRSRSEALAKKLDYQRTVSLDLSNVTSTSPRVVNVKRASVSTNKSSVFPSPGTPTYLHSMQKGW SSERVPLRSNGGRSPPNAGFLPLYSGRTVPSKWEDAERWIVSPLAKEGAARTSFGASHERRPKAKSGPLGPPGFAYYSLY SPAVPMVHGGNMGGLTASSPFSAGVLPETVSSRGSTTAAFPQRIDPSMARSVSIHGCSETLASSSQDDIHESMKDAATDA QAVSRRDMATQMSPEGSIRFSPERQCSFSPSSPSPLPISELLNAHSNRAEVKDLQVDEKVTVTRWSKKHRGLYHGNGSKM
R O H N C C OH H H Amino Acid Composition • Basic Amino AcidStructure: • The side chain, R,varies for each ofthe 20 amino acids Side chain Aminogroup Carboxylgroup
O O The Peptide Bond • Dehydration synthesis • Repeating backbone: N–C –C –N–C –C • Convention – start at amino terminus and proceed to carboxy terminus
Peptidyl polymers • A few amino acids in a chain are called a polypeptide. A protein is usually composed of 50 to 400+ amino acids. • We call the units of a protein amino acid residues. amidenitrogen carbonylcarbon
Side chain properties • Carbon does not make hydrogen bonds with water easily – hydrophobic • O and N are generally more likely than C to h-bond to water – hydrophilic • We group the amino acids into three general groups: • Hydrophobic • Charged (positive/basic & negative/acidic) • Polar
More Polar Amino Acids And then there’s…
Psi () – the angle of rotation about the C-C bond. Phi () – the angle of rotation about the N-C bond. The planar bond angles and bond lengths are fixed. Planarity of the Peptide Bond
Primary & Secondary Structure • Primary structure = the linear sequence of amino acids comprising a protein:AGVGTVPMTAYGNDIQYYGQVT… • Secondary structure • Regular patterns of hydrogen bonding in proteins result in two patterns that emerge in nearly every protein structure known: the -helix and the-sheet • The location of direction of these periodic, repeating structures is known as the secondary structure of the protein
The Alpha Helix 60°
Properties of the Alpha Helix • 60° • Hydrogen bondsbetween C=O ofresidue n, andNH of residuen+4 • 3.6 residues/turn • 1.5 Å/residue rise • 100°/residue turn
Properties of -helices • 4 – 40+ residues in length • Often amphipathic or “dual-natured” • Half hydrophobic and half hydrophilic • If we examine many -helices,we find trends… • Helix formers: Ala, Glu, Leu, Met • Helix breakers: Pro, Gly, Tyr, Ser
The beta strand (& sheet) 135° +135°
Properties of beta sheets • Formed of stretches of 5-10 residues in extended conformation • Parallel/aniparallel,contiguous/non-contiguous
Turns and Loops • Secondary structure elements are connected by regions of turns and loops • Turns – short regions of non-, non- conformation • Loops – larger stretches with no secondary structure. • Sequences vary much more than secondary structure regions
Levels of Protein Structure • Secondary structure elements combine to form tertiary structure • Quaternary structure occurs in multienzyme complexes
Protein Structure Example Beta Sheet Helix Loop ID: 12as 2 chains
Views of a Protein Wireframe Ball and stick
Views of a protein Spacefill Cartoon CPK colors Carbon = green, black, or grey Nitrogen = blue Oxygen = red Sulfur = yellow Hydrogen = white
Mostly Helical Folding Motifs • Four helical bundle: • Globin domain:
/ Motifs • / barrel:
Determining the Structure of a Protein Experimental Methods X-ray NMR As of August 2013, structure of > 85,000 proteins are determined
X-Ray Crystallography Discovery of X-rays (Wilhelm Conrad Röntgen, 1895) Crystals diffract X-rays in regular patterns (Max Von Laue, 1912) The first X-ray diffraction pattern from a protein crystal (Dorothy Hodgkin, 1934)
X-Ray Crystallography • Grow millions of protein crystals • Takes months • Expose to radiation beam • Analyze the image with computer • Average over many copies of images • PDB • Not all proteins can be crystallized!
NMR • Nuclear Magnetic Resonance • Nuclei of atoms vibrate when exposed to oscillating magnetic field • Detect vibrations by external sensors • Computes inter-atomic distances. • Requires complex analysis. NMR can be used for short sequences (<200 residues) • More than one model can be derived from NMR.
Determining the Structure of a Protein Computational Methods
The Protein Folding Problem • Central question of molecular biology:“Given a particular sequence of amino acid residues (primary structure), what will the secondary/tertiary/quaternary structure of the resulting protein be?” • Input: AAVIKYGCAL…Output: 11, 22…
Structure v.s. Sequence • Observation: A protein with the same sequence (under the same circumstances) yields the same shape. • Protein folds into a shape that minimizes the energy needed to stay in that shape. • Protein folds in ~10-15 seconds.
Chou-Fasman methods • Uses statistically obtained Chou-Fasman parameters. • For each amino acid has • P(a): alpha • P(b): beta • P(t): turn • f(): additional turn parameter.
C.-F. Alpha Helix Prediction (1) P(a) P(b) • Find P(a) for all letters • Find 6 contiguous letters, at least 4 of them have P(a) > 100 • Declare these regions as alpha helix
C.-F. Alpha Helix Prediction (2) P(a) P(b) • Extend in both directions until 4 consecutive letters with P(a) < 100 found
C.-F. Alpha Helix Prediction (3) P(a) P(b) • Find sum of P(a) (Sa) and sum of P(b) (Sb) in the extended region • If region is long enough ( >= 5 letters) and P(a) > P(b) then declare the extended region as alpha helix
C.-F. Beta Sheet Prediction • Same as alpha helix replace P(a) with P(b) • Resolving overlapping alpha helix & beta sheet • Compute sum of P(a) (Sa) and sum of P(b) (Sb) in the overlap. • If Sa > Sb => alpha helix • If Sb > Sa => beta sheet
C.-F. Turn Prediction P(a) P(b) P(t) f() • An amino acid is predicted as turn if all of the following holds: • f(i)*f(i+1)*f(i+2)*f(i+3) > 0.000075 • Avg(P(i+k)) > 100, for k=0, 1, 2, 3 • Sum(P(t)) > Sum(P(a)) and Sum(P(b)) for i+k, (k=0, 1, 2, 3)
Other Methods for SSE Prediction • Similarity searching • Predator • Markov chain • Neural networks • PHD • ~65% to 80% accuracy