1 / 39

L. Aravind Computational Biology Branch National Center for Biotechnology Information

Evolution of biocatalysis structural and genomic trends. L. Aravind Computational Biology Branch National Center for Biotechnology Information. Nitrogenase. Fritz Haber. Bacterial nitrogen fixation. Haber Process for NH3 synthesis. N2 + 8H+ + 8e- ↔ 2NH3 + H2.

bjohnston
Télécharger la présentation

L. Aravind Computational Biology Branch National Center for Biotechnology Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution of biocatalysis structural and genomic trends L. Aravind Computational Biology Branch National Center for Biotechnology Information

  2. Nitrogenase Fritz Haber Bacterial nitrogen fixation Haber Process for NH3 synthesis N2 + 8H+ + 8e- ↔ 2NH3 + H2 N2(g) + 3H2(g) ↔ 2NH3(g) + ΔH Ambient pressure 20-30°C Catalytic metal cluster with active molybdenum or vanadium 16ATP->ADP+Pi 200..900 Atmospheres pressure 450°C Iron catalyst (Molybdenum promoter) Enzymes achieve incredible feats with relative ease How do they do that?

  3. Zinc Some metals are common to both industrial and biological catalysts… Iron Molybdenum Yet the biological systems… bypass the draconian temperature and pressure regimes… So the protein scaffold and small molecule co-factors matter in some way. Nickel Vanadium Magnesium Industrial and biological catalysts

  4. HIGH nucleotidyltransferases Aminoacyl tRNA synthetases Translation Other Rosmannoids ETFP- A and B PP-ATPases Photolyase USPA tRNA Biogenesis Origin of KMSK signature Nucleotidyl transferase ATPase, AMP binding LUCA Where did the specificity come from? Origin of bi-helical C-terminal module module Pyrophosphatase ATP->AMP+PP

  5. RNASE P- S5 domain (Ribozyme) Ribosomal protein S5- S5 domain (RNA binding protein) RNASE PH active site region (enzyme) Both Rnase P and Rnase PH share a common S5 domain that is found in several nucleic acid binding contexts RNAse P has an active Ribozyme while RNAse PH is entirely a protein enzymes At the junction between the protein and RNA worlds: RNASE PH

  6. RNASE PH The active site was probably built on to the protein core with additional protein motif RNASE P Active site on ribozyme D protein R ribozyme It is possible that the early nucleic acid binding domains were in place even when ribozymes were still active. Catalytic activities appeared in these proteins slowly displacing the ribozymes

  7. Analysis of phyletic patterns and higher order evolutionary relationships show that many ancient protein folds had paralogous representatives that can be traced back to LUCA • Most of these ancient folds contain RNA-binding versions and the particular representatives are often associated with RNA-binding. • We have evidence for protein synthesis before the translation apparatus was in place • A Ribozyme makes all extant proteins • Suggests a possible role for RNA and emergence of enzymes by displacement of ribozymes by protein enzymes The Echoes of a lost world

  8. The continuing story of enzymes • General tendencies in enzyme evolution • Are there different temporal phases in which different catalytic activities were acquired by different folds ? • Are there differences in terms of the number of different catalytic activities accommodated by different folds? • What are their obvious structural determinants? • Invention of enzymes in the later phases of evolution • How do non-enyzmatic domains become enzymes ? • Similar active sites different catalytic mechanisms • Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily • The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion • Convergent evolution of active sites • The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase • The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases • Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases • Does the scaffold matter in catalysis? • The mysterious case of the ISOCOT fold • Are there engines of innovation in the biosphere? • The bacterial engine of enzyme innovation

  9. Some fundamental concepts • Bio-catalysis relies on a constellation of a few amino acid residues that are embedded in a distinct globular domain of the enzyme (the catalytic domain). • Generally, the set of catalytic residues is highly conserved during evolution, but variations occur in the substrate-binding and cofactor-binding sites. This is exploration of “substrate space” by essentially adapting the same biochemical activity on a range of different substrates. (Same basic biochemical activity) • E.g. various Rossmann-fold methyltransferases transfering methyl groups from AdoMet to various substrates. • In contrast, enzymes with similar catalytic residues may explore considerable diversity in “reaction space”. (Different activities) • E.g. DnaG-type primases and most topoisomerases share a catalytic domain (the TOPRIM domain) with an identical core set of catalytic residues. Two distinct reactions: nucleotidyltransferase (transferase; class 2 according to the EC classification) and topoisomerase (isomerase; class 5 according to the EC classification) • A more dramatic case of exploration of the reaction space is seen in the stem glycolytic (Embden-Meyerhoff) pathway: four of the glycolytic enzymes, 1,6 fructose bisphosphate aldolase, triose phosphate isomerase, enolase, and pyruvate kinase, that catalyze three distinct reactions, have the same structural scaffold, the TIM barrel

  10. Saccharomyces cerevisiae 35 30 25 20 Number of Proteins (per 1000) 15 10 5 0 0 5 10 15 20 Activities Pseudomonas aeruginosa 35 P-loop NTPases Scatter plot of the number of distinct enzymatic activities vs. the number of representatives of common folds in the proteomes of the proteobacterium Pseudomonas aeruginosa andtheyeast Saccharomyces cerevisiae The number of activities in most common folds scales linearly with their prevalence in the proteome However, there are some striking exceptions such as the P-loop NTPases and classical Rossmann fold enzymes which show extensive proliferation but apparently very little exploration of reaction space 30 Classic Rossmann fold TIM Barrel RRM-like HUP 25 DSBH beta-propeller RNAseH 20 Number of Proteins (per 1000) Metallobetalactamase P-Loop ATPase double psi barrel 15 T-Fold Rhodanese PFL-like HAD-like 10 HIT-like Rossmann-fold 5 0 0 5 10 15 20 Activities TIM Barrel RRM-like HUP DSBH beta-propeller P-loop NTPases RNAseH Metallobetalactamase P-Loop ATPase double psi barrel T-Fold Rhodanese PFL-like HAD-like HIT-like Rossmann-fold Classic Rossmann fold

  11. Folds with few and many activities MSOR PSUS, cyclase, polymerase Primase (polymerase) ACP Many NDPK NH2 Dehydrogenase COOH Phytase, Arabinase Dehydrogenase NH2 Phytase, COOH RRM-Like Fold Phytase Arabinase Phytase Walker A loop Arabinase Walker B-aspartate COOH Few BetaPropeller NH2 Cyclase, polymerase P-loop NTPase For each fold, the positions of the catalytic residues from several representative examples are shown in red

  12. Of shapes and functions • The TIM barrel, the beta-propellers, and the DSBH domain contain a central pocket that binds their substrates and/or cofactors, with an approximate cyclic symmetry. • The pocket that is inherent to these structures allows easy accommodation of diverse substrate molecules through low-specificity interactions. Subsequently, natural selection could act on these proteins to fix residues that impart interaction specificity and catalytic capacity. The intrinsic symmetry of the central pockets of these folds creates the potential for different catalytic residues to emerge on the surface of the substrate-binding site, providing for the evolution of a wide range of activities. • The two-layered RRM-like fold that consists of two helices packed against a four-stranded anti-parallel sheet represents another structural principle in the evolution of multicatalytic folds. • The main theme in this case appears to be the large exposed surface area of an entire sheet that is provided by the two-layered structure. • The P-loop and Rossmann folds are 3-layered sandwiches where a central sheet is protected on both sides by helices. This only leaves the loops for interactions and this configuration has been less favorable to explore reaction space.

  13. The continuing story of enzymes • General tendencies in enzyme evolution • Are there different temporal phases in which different catalytic activities were acquired by different folds ? • Are there differences in terms of the number of different catalytic activities accommodated by different folds? • What are their obvious structural determinants? • Invention of enzymes in the later phases of evolution • How do non-enyzmatic domains become enzymes ? • Similar active sites different catalytic mechanisms • Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily • The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion • Convergent evolution of active sites • The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase • The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases • Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases • Does the scaffold matter in catalysis? • The mysterious case of the ISOCOT fold • Are there engines of innovation in the biosphere? • The bacterial engine of enzyme innovation

  14. Ancestral unit Dimerization and ligand- binding emergence of key polar residues small molecule binding domain ofHutC/FarR transcription factors Chorismate lyase Chorismatepyruvate+4-hydroxybenzoate The UTRA domain: non-enzymatic domain to enzyme Chorismate lyase

  15. The continuing story of enzymes • General tendencies in enzyme evolution • Are there different temporal phases in which different catalytic activities were acquired by different folds ? • Are there differences in terms of the number of different catalytic activities accommodated by different folds? • What are their obvious structural determinants? • Invention of enzymes in the later phases of evolution • How do non-enyzmatic domains become enzymes ? • Similar active sites different catalytic mechanisms • Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily • The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion • Convergent evolution of active sites • The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase • The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases • Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases • Does the scaffold matter in catalysis? • The mysterious case of the ISOCOT fold • Are there engines of innovation in the biosphere? • The bacterial engine of enzyme innovation

  16. K DxD D D DHH S D DHH RECEIVER S4eq S5eq S1 S2 S4eq S5eq S1 S2 S6eq S6eq D E DxD D VWA TOPRIM S5eq S4eq S1 S4eq S2 S2.1 S1 S2 S6eq S5eq C0/C1 Cap Insertion DxD K C2 Cap Insertion T/S DD/DxxxD Classic HAD S6 S4 S5 S1 S3 S2 S3.1 S3.2 Acidic active site Rossmanoid folds

  17. FLAP Squiggle Anatomy of the HAD catalytic domain

  18. Elaboration of C1 caps 1N9K 1SU4 Acid Phosphatase P-type ATPase 1O08 1TA0/1U7O SDT1 CTD/MDP-1 1K1E 8KDO Phosphatase 1MH9 1F5S 1QYI Phosphoserine Deoxyribonucleotidase Zr25 Phosphatase

  19. HisB family C C C Zn Histidinol phosphate phosphatase family Nagd Clade PSP family 1FS5, serB SerB subfamily 1VJR, Tm1742 CIN/AraL subfamily 1Y8A, Af1437 Af1437 subfamily Elaboration of C1 caps Cof Clade 1NF2, Tm0651 1NRW, Ywpj Cof family Cof family 1U02, otsB Trehalose Phosphate 1L6R, Apc0014 Phosphatase family Cof family 1XVI, Yedp Mannosyl-3-phosphoglycerate phosphatase family

  20. Enzyme may emerge from non-enzymatic ligand binding domains by acquisition of key catalytic residues The development of special structures around the active site play a major role in influencing catalytic mechanisms

  21. The continuing story of enzymes • General tendencies in enzyme evolution • Are there different temporal phases in which different catalytic activities were acquired by different folds ? • Are there differences in terms of the number of different catalytic activities accommodated by different folds? • What are their obvious structural determinants? • Invention of enzymes in the later phases of evolution • How do non-enyzmatic domains become enzymes ? • Similar active sites different catalytic mechanisms • Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily • The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion • Convergent evolution of active sites • The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase • The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases • Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases • Does the scaffold matter in catalysis? • The mysterious case of the ISOCOT fold • Are there engines of innovation in the biosphere? • The bacterial engine of enzyme innovation

  22. eIF5A is a translation initiation factor found in archaea and eukaryotes and is required for the formation of the first peptide bond (Ortholog of bacterial EF-P). Critical for its function is the modification of a highly conserved lysine: In archaea it is modified into deoxyhypusine, while in eukaryotes it is modified further into hypusine. This additional modification is critical for the fitness of all eukaryotic models studied to date. eIF5A is the only protein in the whole cell with this unusual amino acid hypusine. This amino acid and the activity appears to be unique to eukaryotes While the enzyme catalyzing the first step is well-known, deoxyhypusine hydroxylase has eluded identification for 20 years All that was known about it was that it might have metal-dependence. This is not surprising given that most other hydroxylases show this dependence Active site convergence in hydroxylases: the elusive deoxyhypusine hydroxylase

  23. 2OG-Fe Dioxygenases AraC-like DSBH In addition to sharing a common fold they share a a peculiar HX[HQD] signature at the N-terminus and a conserved H at the C-terminus. They both bind ligands in the interior and catalyze a range of monooxygenase or dioxygenase reactions However, there are two major divisions within them that differ in terms of their dependence on 2-ketoacid co-factors This class includes non-catalytic sugar binding domains of prokaryotic transcription factors and the JOR domain which was predicted to demethylate chromatin proteins Several new members were discovered in this class including protein lysyl and prolyl hydroxylases and alkylated base methylases, like AlkB, the DNA repair enzyme The majority of hydroxylases can be unified into the Double-stranded beta helix fold Despite these enzymes having all the necessary credentials to function as a deoxyphypusine hydroxylase none of them matched their phyletic patterns mirrored that of DOHH. Experimental tests with enzymes of this family for DOHH activity also failed to uncover any activity.

  24. High throughput protein interactions recovered HEAT repeat protein as the principle eIF5A interactor outside of the translation apparatus HE HE HE HE The metal chelating sites are boxed Typically HEAT repeat proteins are involved in protein-protein interactions. But DOHH and a few related phycocyanobilin synthases are rare all alpha-helical enzymes and use the HEAT repeats as their catalytic scaffold. Deoxyhypusine hydroxylase turns out to be HEAT repeat protein with a symmetric dyad of 4 repeats

  25. Completely different scaffolds but similar active site

  26. + The nucleophilic attack by water on a NTP results in a hypercharged pentavalent intermediate which needs to be stabilized for the reaction to proceed In GTPases this was found to be mediated by the GTPase-activating protein the GAP which provides an arginine finger which stabilizes the intermediate. R R R R R R R PilT SFI/II AAA+ ATPase STAND DNAB HerA/FtsK The arginine finger in catalysis of phosphohydrolysis: how general is it?

  27. Arginine fingers are widely utilized but are not conserved even within the P-loop NTPases • They have evolved in at least 5 distinct families of enzymes and on at least 14 independent occasions in the P-loop NTPase fold. On at least one occasion, the P-loop NTPases have innovated a potassium finger, where a potassium in coordinated by an acidic residue. • Combining the spatial locations of R-fingers with the classification scheme for the P-loop NTPase suggests that is was: • probably absent in the ancestral version of the fold; • received their R-finger from a ribozyme because arginine has been found to be a cofactor from phosphotransfer catalyzing ribozymes • it has shifted position in course of evolution • This differential positioning of the R-finger allowed several different ways of coupling the free energy of NTP hydrolysis to different downstream motor functions. Thus, it seems to have been a major factor in the occupation of diverse sub-cellular niches by the P-loop NTPase fold. Arginine finger Methenyl tetrahydrofolate synthetase R K Lysine finger P-type ATPase (HAD superfamily) The tale of moving fingers in P-loop NTPases

  28. The SPOUT superfamily of methyltransferases includes a vast group of RNA methylases that are prototyped by SpoU and TrmD. They mediate the transfer of -CH3 from AdoMet to various bases. • They differ from all classic methyltransferases in having a unique active site constellation. • The N-terminal motif involved in SAM binding is a glycine-rich loop similar to other methylases. • But they have an additional C-terminal motif that is associated with a structural knot Regular Rossmannoid fold SPOUT AdoMet Knot Rotation of C-terminal unit Tale of two knots

  29. SET domain methylases have a knotted active site • The SET domain is a methyltransferase that is prevalent in eukaryotic chromosomal proteins • Members of this superfamily methylate histones, other chromosomal proteins and cytoplasmic proteins such as RUBISCO and cytochromes • Crystal structures suggested that it has a unique complex fold that that is different from the classic methylases with the Rossmann domains and the SPOUT domains • Phylogenetic and phyletic analysis suggests that this domain has originated de novo in the eukaryotic lineage • How did this happen?

  30. Ancestral simple 3-strand unit Existence as obligate ligand-binding dimer Knot Duplication favoring knot formation Insertion/ further duplication in loop: differentiation of two dimers AdoMet Origin of the SET domain through duplication of a simple unit

  31. The continuing story of enzymes • General tendencies in enzyme evolution • Are there different temporal phases in which different catalytic activities were acquired by different folds ? • Are there differences in terms of the number of different catalytic activities accommodated by different folds? • What are their obvious structural determinants? • Invention of enzymes in the later phases of evolution • How do non-enyzmatic domains become enzymes ? • Similar active sites different catalytic mechanisms • Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily • The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion • Convergent evolution of active sites • The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase • The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases • Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases • Does the scaffold matter in catalysis? • The mysterious case of the ISOCOT fold • Are there engines of innovation in the biosphere? • The bacterial engine of enzyme innovation

  32. NH2 MdcA Metal Chelating flap RpiA H Metal C C S4b S4b S4a S4a H3 NH2 COOH S6 S5 S4 S1 S2 S3 S6 S5 S4 S1 S2 S3 H4 H4 S6a H0 H1 S6a H0 H1 H2 COOH CoA C term COOH Eif2B (1t9k) MTHFS ISOCOT Core Structure Sol1 NagB S4b NagB & COOH Sol1 S4a H3 S6 S5 S4 S1 S2 S3 S4b N COOH S4a H4 H3 H0 H1 H2 S6 S5 S4 S1 S2 S3 N NH2 H4 H0 H1 H2 CoA transferase N-terminal domain NH2 CoA transferase All ISOCOT domains The ISOCOT domain is shared by sugar isomerases, eIF2B, DeoR transcription factors, acyl-CoA transferases and methenyltetrahydrofolate synthetase

  33. The ISOCOT domain is shared by sugar isomerases, eIF2B, DeoR transcription factors, acyl-CoA transferases and methenyltetrahydrofolate synthetase

  34. The ISOCOT fold is a derived version of the Rossmannoid fold with a specialized flap tucked under an archway • Despite considerable catalytic diversity, the general location of the substrate binding sites is similar in this fold • There are unique extensions and inserts that form caps, which control access to the active site • Although these common positions are involved in substrate interactions throughout the superfamily, there is considerable variety in terms of the actual residues in these positions • Interestingly, even ISOCOT fold enzymes catalyzing similar reactions may not share specific catalytic residues, beyond the generic features of the fold • Ribose phosphate isomerase and methylthioribose-1-phosphate isomerase • Oxacid :acetyl CoA transferase, malonate:acetyl-S-ACP transferase and citrate:acetyl-S-ACP transferase • In most of the other large enzymatic superfamilies: members catalyzing mechanistically similar reactions generally preserve a fixed set of highly conserved active site residues • Thus, the ISOCOT fold indicates that certain substrate-binding scaffolds may, by themselves, play a major role in allowing particular catalytic activities, and show a lower dependence on strictly conserved residues Unusual features of the ISOCOT fold

  35. The continuing story of enzymes • General tendencies in enzyme evolution • Are there different temporal phases in which different catalytic activities were acquired by different folds ? • Are there differences in terms of the number of different catalytic activities accommodated by different folds? • What are their obvious structural determinants? • Invention of enzymes in the later phases of evolution • How do non-enyzmatic domains become enzymes ? • Similar active sites different catalytic mechanisms • Example of diversification of the acidic active site in the Rossmannoid class of domains and the HAD superfamily • The tale of flaps, caps and squiggles in the HAD superfamily and solvent exclusion • Convergent evolution of active sites • The tale of two hydroxylases and the discovery of the deoxyhypusine hydroxylase • The case of moving fingers: convergent evolution of arginine fingers in different phosphohydrolases • Structural convergence: the knotted active site of SPOUT and SET domain methyltransferases • Does the scaffold matter in catalysis? • The mysterious case of the ISOCOT fold • Are there engines of innovation in the biosphere? • The bacterial engine of enzyme innovation

  36. HAD superfamily Early transfers Deoxyribonucleotidase family YniC subfamily (BPGM family) Dehr subfamily I (HAD family) CUT1/CECR5 subfamily (NagD family) Phosphomannomutase (PMM) family (cof clade) Total: 5 HAD superfamily Late transfers transfers 8KDO phosphatase clade--vertebrates cN-I nucleotidase clade--vertebrates sEHCT/Acad10 subfamily--animals Phosphohistidine/phospholysine phosphatase subfamily --animals VSP subfamily --plants Sucrose phosphate synthase C-terminal domain (SPSC) family --plants Sucrose phosphate phosphatase (SPP) family--plants CbbY subfamily--plants HerA-associated family--plants DOG subfamily (BPGM family)--fungi NapD subfamily (PSP family)--fungi PHM8-SDT1 subfamily (Sdt1p family)--fungi, plants, microsporidians Dehr subfamily II (HAD family)--fungi, C.elegans, Giardia YihX subfamily (Sdt1p family)--some fungi, plants and 7 others Total:21 Bacteria Bacterial versus archaeal inheritances of eukaryotic enzymes HAD superfamily Vertical inheritence MDP-1/RNA polymerase phosphatase Total: 1 Archaea

  37. The currently available data suggests that: In most major superfamilies of enzymes the direction of flow of laterally transferred proteins is from bacteria to eukaryotes and bacteria to archaea In eukaryotes most major biochemical innovations related to neurotransmitter biosynthesis, poly and oligosaccharide chains for glycoproteins, novel substrate utilization are due to enzymes acquired relatively late in eukaryotic evolution. Thus, not only did the bacteria contribute to the fundamental aspects of eukaryogenesis, but also appear to be the chief providers for new biochemical activities.

  38. Acknowledgements Collaborators Aravind group Vivek Anantharaman Eugene Koonin Group (NCBI) Detlef Leipe Group (NCBI) Max Burroughs MH Park group (NICFD) Karen Allen group (Boston U) Lakshminarayan Iyer

More Related