Chapter 4 The Chemical Composition of Proteins Homework II (cont’d): Ch. 6 problems 6, 7, 8 (optionally 10) and Ch. 7 problems 3, 6, 7, 9, 10 (optionally 12) a useful site: www.cbi.pku.edu.cn Use blast search in database swissprot PDB is available on this site. Rasmol is available for Win98 if you don’t want to download Chemscape.
1. Proteins are macromolecules and variable in sizes 1.1 There is no simple generalization between size and function. Small ones can be less than 10 kD (insulin ~6 kD); big ones more than 1000 kD. 1.2 Proteins can be monomeric or oligomeric. Monomeric proteins contain only one covalent structure (either a single polypeptide chain or more chains connected by covalent bonds). Oligomeric proteins contain more than one covalent structures interacting by noncovalent interactions. Each covalent structure in an oligomeric protein is called a subunit (hence multisubunit proteins).
2. Proteins have characteristic amino acid compositions 2.1 Proteins can be hydrolyzed in bases or acids to free a-amino acids. They are usually hydrolyzed in 6 M HCl at 110ºC for 24 hours. 2.2 The resulting characteristic proportion of different amino acids, namely, the amino acid composition was used to distinguish different proteins before the days of protein sequencing. Various proteins are currently distinguished by their specific amino acid sequences (by 3-D structures in the future?)
3. Some proteins contain chemical groups other than amino acids 3.1 Many proteins contain only amino acids, e.g., insulin, ribonuclease A, chymotrypsin. 3.2 Some proteins (conjugated proteins) contain other components. 3.2.1 Cytochrome c and myoglobin contain heme groups. Immunoglobulin G contains carbohydrate groups.
3.2.2 The non-amino acid parts are usually called prosthetic groups and the protein part alone called apoprotein. (holoenzyme=apoenzyme+substrates+cofactors+prosthetic groups) 3.2.3 Prosthetic groups usually play important roles for protein functions. 3.2.4 The conjugated proteins are usually classified according to the nature of their prosthetic groups. Lipoproteins: lipids Glycoproteins: carbohydrate groups Metalloproteins: different metals (ions).
4. The amino acid sequence of short polypeptide chains can be determined by chemical methods. 4.1 Amino acid sequence of a peptide chain is the identity and linking order of its amino acid residues. No other properties so clearly distinguish one peptide from another. 4.2 Sanger worked out the first amino acid sequence of a peptide (bovine insulin) in 1953.
He accomplished this by using 1-fluoro-2,4-dinitrobenzene(1- 氟-2，4-硝基苯）to react with the N-terminal residues of cleaved short peptides. 100 g of insulin were consumed over ten years to determine the sequence. The peptide chains were cut into 150 fragments of different lengths. He was awarded the Nobel Prize in 1958 in chemistry for this breakthrough invention.
4.3 The amino acid sequence of a short peptide can be efficiently determined by Edman degradation(埃德曼降解）. 4.3.1 The uncharged terminal amino group is reacted with phenylisothiocyanate(苯异硫氰酸盐,异硫氰酸)to form a phenylthiocarbamyl(苯氨基硫代甲酰基) peptide. 4.3.2 The N-terminal amino acid residue is liberated as a cyclic phenylthiohydantoin (PTH) derivative under mildly acid conditions, leaving the rest of the peptide chain intact. 4.3.3 The PTH derivative (thus the amino acid residue) can be identified by chromatographic methods 4.3.4 The newly exposed N-terminal amino acid residue can be identified by repeating the above procedure.
4.4 The N-terminal amino acid sequence of a polypeptide chain can be easily obtained by using a fully automated sequenator. 4.4.1 The machine is designed based on the Edman degradation method. 4.4.2 The peptide is covalently linked to glass beads through its carboxyl terminals. One cycle of the Edman degradation is carried out in less than 2 hours. 4.4.3 Usually 50 (10-20) residues from the N-terminal can be routinely determined by the sequenator. 4.4.4 Less than a microgram (or picomoles?) of the peptide is needed for such sequence determination.
5. Large proteins are cleaved into short peptides and then sequenced (the “divide and conquer” strategy!) 5.1 Disulfide bonds(二硫键）, if exist, need to be broken first. 5.1.1 The PTH-cysteines would not be released if connected by disulfide bonds. 5.1.2 The disulfide bonds can be reduced by dithiothreitol (DTT, 二硫苏糖醇) or b-mercaptoethanol(巯基乙醇), and then alkylated with iodoacetate to prevent reformation of the disulfide bonds. Addition of iodoacetate in SDS-PAGE can prevent cross-linking by disulfide bonds between subunits.
5.2 Polypeptide chains are cleaved into short fragments by chemical or enzymatic methods and then sequenced by Edman method. 5.2.1 Cyanogen bromide (CNBr) cleaves polypeptides on the carboxyl side ofmethionine residues. 5.2.2 A set of proteases cleave peptide chains adjacent to specific amino acid residues: Trypsin, specifically on the carboxyl side of Arg（精氨酸）and Lys（赖氨酸）; Chymotrypsin（胰凝乳蛋白酶,糜蛋白酶）, the carboxyl side of Phe（苯丙氨酸）, Tyr（酪氨酸）, and Trp（色氨酸）. (terminals?) 5.2.3 The fragments thus produced need to be separated (purified) by chromatographic（色谱分析的）or electrophoretic（电泳的）methods before they can be sequenced.
5.3 The polypeptide has to be cleaved by at least two sets of reagents to get the order of the short peptides on the polypeptide. 5.3.1 A second set of short peptides overlapping（重叠）the first set is needed to put the first set in the correct order. 5.3.2 If the second set fails to provide appropriate overlapping sequences a third or even further cleavage is needed.
5.4 The positions of disulfide bonds, if existing, need to be located. 5.4.1 This can be accomplished by comparing patterns of peptide fragments on electrophoresis gels with and without breaking the disulfide bonds.
6. The amino acid sequences of many proteins are currently deduced from their genes or cDNA（互补DNA） sequences 6.1 The amino acid sequence of a protein is encoded by its corresponding gene. Every three bases consist of a genetic code, which is translated into a specific amino acid on the polypeptide chain.
6.2 Genes encoding specific proteins are routinely isolated (cloned) in the laboratory of molecular biology. 6.2.1 Sequencing DNA is much easier (faster and more accurate) than sequencing a polypeptide. Genome projects and databases. 6.2.2 Amino acid sequences of proteins are mostly deduced from their DNA sequences nowadays! 6.2.3 The partial amino acid sequence of a protein can be used for its gene isolation. 6.2.4 Disulfide bonds can not be deduced from DNA sequences and has to be determined directly. 6.2.5 Proteins can be much more efficiently studied with their genes available! New techniques for large scale processing are being developed (mass spectrum[质谱]).
Mass spectrometer measures the ratio of the mass to the electric charge of a particle ESI: a method of ionization or charging the macromolecule
7. The function of a protein depends on its amino acid sequence 7.1 Each separate type of protein has a unique amino acid sequence. 7.1.1 Each of the ~3000 different proteins in an E.coli cell, the ~100,000 ones in a human being has a different amino acid sequence. 7.1.2 Proteins of different functions always have different amino acid sequences. Genomic anotation is assigning functions to genes based on sequence information. ？
7.2 Many human genetic diseases have been traced to the deficiency of a single enzyme or protein. The deficiency of many such enzymes are found to be caused by a single change of amino acid residue, indicating protein functions are determined by their amino acid sequences. 7.3 Proteins that have similar functions but from different species are found to be very similar in amino acid sequences. (cytochrome c, myoglobin, etc.) sequence homology as a basis for phylogenetic trees.
7.4 Many proteins have variations in their amino acid sequences in the same population (individual polymorphism) or in different species. 7.4.1 Many such variations in amino acid sequences, especially in the same population, mostly do not affect protein functions. 7.4.2 This phenomenon of amino acid variation in the same population is called polymorphism and widely used in places where identification of individuals are needed (in the court, by examining DNA molecules). 7.4.3 Proteins of the same (or similar) functions vary in their amino acid sequences, in proportion to their evolutionary distance or relationship (more different, less similar, farther in relationship).
Conservation and variation of cytochrome c sequences 27 invariant residues (yellow), conservative substitutions (blue) nonconservative or variable residues (unshaded)
8. The amino acid sequences of proteins provide important biochemical information (application of bioinformatic tools). 8.1 A newly revealed amino acid sequence (of well studied or unknown proteins) is usually compared with a large bank of stored sequences. 8.1.1 Thousands of sequences have been revealed and stored in computerized databases (e.g., Swiss-Prot, PIR, TreEMBL). 8.1.2 Sequence similarity usually reveals functional relatedness.
8.2 Homologous proteins share a common ancestor during evolution, hence are of the same function. 8.2.1 Homology describes the percentage of similarity (or identity) between two proteins (or DNA molecules), usually referring to sequence rather than structure. 8.2.2 Amino acid residues between two proteins at certain positions (counting from the N-terminal as 1) can be identical, conservative (similar), or variable. 8.2.3 The amino acid sequences of two or more homologous proteins can be compared by sequence alignment.
8.2.4 Evolution can be studied quantitatively at the molecular level: phylogenetic trees are made by using the number of residues that differ. 8.2.5 Highly conserved residues usually play important roles in protein structure and/or function. (need to be further exploited, e.g. How does it determine the 3D structure?). 8.2.6 Homologous proteins share the same three-dimensional structure. (sequence-structure-function paradigm, unless the structure is flexible and non-unique).
8.3 Certain amino acid sequences often serve as signals (are thus called signal peptides) that determine the cellular location, chemical modification, and half-life of proteins. 8.3.1 Different signal peptides lead proteins to various locations in the cell. 8.3.2 Such signal motifs are being identified. 8.4 There are well-conserved short sequences in proteins with specific functions (for binding, modification, regulation), most of them are expected to have a loop (relative flexible) structure. Identification of these short sequences with known functional sequences may shed light on the functions of an unknown protein. Database of sequences with known functions exist (PROSITE in Expasy) (e.g., ATP binding loop).
8.5 The amino acid sequence of a protein encodes its three-dimensional structure and activity (function) in turn. This is the sequence-structure-function paradigm. The “folding” codes are waiting to be deciphered!