Metabolic Pathway I609: PhD Seminar Computational techniques in comparative genomics

Metabolic PathwayI609: PhD SeminarComputational techniques in comparative genomics KwangminChoi

Overview • Metabolism 101 • Resources • Network topology • Pathway representation • Metabolic pathway analysis using comparative genomics approach • Pathway evolution • Practical application

Metabolism 101

-Omics

Basic keywords • An enzyme is any of several complex proteins that are produced by cells and act as catalysts in specific biochemical reactions • A reaction is a process in which one or more substances are changed chemically into one or more different substances • Metabolism is a step by step modification of the initial molecule to shape it into another product. • Catabolism • Anabolism • A metabolite is any substance involved in metabolism either as a product of metabolism or necessary for metabolism (substrate)

Catalytic reaction by enzyme

EC number(from wikipedia) • EC 1 : Oxidoreductases • EC 1.1 : Acting on the CH-OH group of donors • EC 1.1.1 : With NAD+ or NADP+ as acceptor • EC 1.1.1.1 : alcohol as substrate

Metabolome and Metabolomics • Metabolome • complete set of small-molecule metabolites to be found within a biological sample, such as a single organism • Target for drug discovery: Biomarkers of physiological disease (diagnostics). • Target for metabolic engineering (e.g. Jurassic park) • Metabolomics • "the quantitative measurement of the dynamic multiparametric metabolic response of living systems to pathophysiological stimuli or genetic modification" • Metabolomics can be regarded as the end point of the 'omics' era (genomics, transcriptomics, proteomics, metabolomics….etc.) • The changes in the metabolome are the ultimate answer of an organism to genetic alterations, disease, or environmental influences.

MetabolismTCA cycle, glycolysis

Metabolic pathway/network • A metabolic pathway is a series of chemical reactions occurring within a cell, catalyzed by enzymes, and resulting in either the formation of a metabolic product to be used or stored by the cell, or the initiation of another metabolic pathway. • often require dietary minerals, vitamins and other cofactors • Networks of metabolite feedback pathways • regulate gene and protein expression, • also can mediate signaling between organisms. • Directed graph • Reaction stoichiometry: Quatatative relationship • A substrate enters a metabolic pathway depending on the needs of the cell and the availability of the substrate. • Initiate another metabolic pathway (flux generating step).

Metabolic pathway: overview Roche Applied Science "Biochemical Pathways" wall charthttp://www.expasy.ch/cgi-bin/show_thumbnails.pl

G5 region : a part of TCA

Metabolic modules: big themes

The size of metabolomes • An attraction of the metabolome has always been that it is numerically smaller, and thus more tractable, than the transcriptome or proteome. • The measured metabolome is greater than that encoded by the genome • it includes molecules acquired exogenously as drugs, foods or food additives, • also include molecules derived from the microflora of the host • Open system, not closed • Saccharomyces cerevisiae has some 1200 reactions and 650 metabolites • The curated human metabolome presently contains respectively some 1100/3300 reactions and 700/2700 metabolites.

The number of reactions and metabolites are underestimated • Some areas of metabolism are more ‘represented’ than others; transporters especially are highly under-represented (for their activities in transporting xenobiotics and pharmaceuticals) • Many enzymes have currently unknown substrates. • It is hard to discover molecules whose existence one does not suspect, and so some molecules might be reasonably prevalent but of unknown chemical identity. (In plants and yeast, most metabolites measured by gas chromatography-mass spectrometry are presently of uncertain identity.) {Genome evolution reveals biochemical networks and functional modules} Von Mering et al., PNAS vol.100(26), 2003

Public resources

KEGG Kyoto Encyclopedia of Genes and Genomeshttp://www.genome.jp/kegg • KEGG is a suite of databases. One of the most integrated pathway DB • 335 pathways from 906 organisms • PATHWAY • holds the current knowledge on molecular interaction networks, including metabolic pathways, regulatory pathways,and molecular complexes • GENES • is a collection of gene catalogs for all the complete genomes and some partial genomes. Each gene catalog is computationally derived from public resources, and is manually reannotated for reconstruction of KEGG pathways. • KEGG GENES is associated with KEGG GENOME containing chromosome maps, • LIGAND Database • provide the linkage between chemical and biological aspects of life in the light of enzymatic reactions • COMPOUND/GLYCAN/REACTION • KO for manually curatedortholog groups • KEGG SSDB for computationally generated ortholog/paralog clusters and gene clusters

KEGG

MetaCychttp://www.metacyc.org • MetaCyc is a database of nonredundant, experimentally elucidated metabolic pathways • More than 900 different organisms are represented • The majority of pathways occur in microorganisms and plants • More than 900 metabolic pathways are stored, with more than 6,000 enzymatic reactions and more than 12,000 associated literature citations • stores all enzyme-catalyzed reactions that have been assigned EC numbers by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology • also stores hundreds of additional enzyme-catalyzed reactions that have not yet been assigned an EC number • MetaCyc contains pathways involved in both primary and secondary metabolism

MetaCyc

Other resources

Network topology

Graph and Network • Graph • Well-known and important concept in discrete mathematics and computer science • Consists of a set of nodes and a set of edges • Node ⇔ Object • Edge ⇔ Relation between two objects • Network • Graph + Some information/meaning • We do not distinguish graphs from networks in this talk

Graph representation of metabolic pathway • Compound Network • Node ⇔ Chemical compound • Edge ⇔ Chemical reaction • Reaction Network • Node ⇔ Chemical reaction (or enzyme) • Edge ⇔ Chemical compound shared by two reactions

Graph and Biological Network Metabolic network (KEGG) Graph Node: Object e.g. Chemical compound Edge: Relation between objects e.g. Chemical reaction

Degree/connectivity, k • How many links the node has to other nodes? • Undirected network • Characterized by an average degree P(k) = 2L/N • N nodes and L links • Directed network • Incoming degree, kin • Outgoing degree, kout

A B C D E F G H I J Graph and Degree • Degree • Node with degree 1: J • Nodes with degree 2: B, C, F, G, H • Nodes with degree 3: E, I, A, D • P(k) (degree distribution): P(1)=0.1, P(2)=0.5, P(3)=0.4

Scale-Free Network • P(k) • Degree distribution • Frequency of nodes with degree k • Scale-free network • P(k) follows power law • Different from random networks

P(k) log P(k) k log(k) Poisson Distribution and Power-Law Distribution Poisson distribution （random graph） Power-law distribution （scale-free graph） e-λλk/k! k -γ

Random Network vs. Scale-free Network P(k) = e-λλk/k! Random Network Scale-free Network P(k) ∝ k -γ ( m0=3, m=2 ) 2/6 4/14 3/10 2/14 3/10 4/14 2/6 2/14 2/6 2/10 2/10 2/14

Scale-free Networks in Real World • Metabolic networks：　γ≒2.24 （depending on species） • Node ⇔ Chemical compound • Edge ⇔ Chemical reaction (almost equivalently, enzyme) • Protein interaction networks：　γ≒2.2 • Node ⇔ Protein • Edge ⇔ Interaction between two proteins • WWW：γ≒2.1 • Node ⇔ Web page • Edge ⇔ Link between web pages • Movie stars：γ≒2.3 • Node ⇔ Actor/Actress • Edge ⇔ Act in the same movie

Compound Network Reaction Network Compound Network vs. Reaction Network

Line Graph Transformation and Metabolic Network • Correspondence • Compound network ⇔Original network • Reaction network ⇔ Transformed network • But, this correspondence is not precise • We will consider more realistic transformation later

1 5 a c 3 5 e 3 3 3 a c e 3 3 d b 4 3 2 4 b d Line Graph Transformation • Edge in G ⇒ Node in L(G) • There is an edge in L(G) if two edges in G share the same node as endpoints

C1 C3 C6 C2 C5 Physical Line Graph Transformation

degree around k (at least k) degree k Main Result • If P(k) ∝ k –γin G, thenP(k)∝ k –γ+1in L(G) • Assuming that there is no assortative mixing in G (i.e,any node hasno preference of high or low degree nodes) • Intuitive Proof • Node v of degree k in G has k edges • These k edges corresponds to k nodes in L(G) • Each of these k nodes in L(G) has degree around k • From v in G, we have k nodes of degree around k in L(G) • Thus, P(k) ∝ k・k –γ ＝k –γ+1in L(G)

Scale-free topology in biological systems • {Network biology: understanding the cell's functional organization}, Barabási et al., Nature Reviews Genetics 5, 101-113, 2004 • Growth process and Preferential attachment • The network emerges through the subsequent addition of new nodes (1,2 in red) • The nodes that appeared early in the history of the network are the most connected ones • e.g. CoA, NAD, GTP • Nodes prefer to connect to more connected nodes. (1 >> 2) • Growth and preferential attachment generate hubs through a ‘rich-gets-richer’ mechanism • Error tolerance • Attack vulneraility • Gene duplication as the origin of preferential attachment • Duplicated genes produce identical proteins that interact with the same partner. Therefore, each protein that is in contact with a duplicated protein gains an extra link • Proteins with a large number of interactions tend to gain links more often, as it is more likely that they interact with the protein that has been duplicated.

XML representation

XML representation • Machine-readable • Easy for data exchange • SBML (for systems biology) • KGML (for KEGG) • BioPAX (for MetaCyc + etc.) • XIN (for DIP) • Etc.

XIN

KGML

BioPAX

SBML

Metabolic pathway analysis using comparative genomics approach

Context-based analysis • A metabolic reconstruction is an attempt to develop a detailed overview of an organism’s metabolism from an analysis of genomic sequence. • Metabolic reconstructions can reveal new aspects of metabolism in well-studied organisms • It supports inference of pathways on the basis of the presence or absence of relevant genes. (missing gene, metabolic hole etc.) • Combining inferred pathways into hierarchical blocks produces metabolic charts specific for a particular organism and connected to individual genes

Goal : “Find the missing links” • Long-term goal • Simulation of whole cell – the virtual cell. • Mid-term goal • Predict cell reaction to change in environment. • Predict cell reaction to gene knockout/modification • Current stage • Describe and calculate network behavior • Assign functions to all proteins • Identify all regulatory events • Fill the gaps!

{Missing genes in metabolic pathways: a comparative genomics approach}, Osterman and Overbeek, Current Opinion in Biochemistry, Vol.7, 2003

Gene cluster • Genes from the same pathway tend to cluster on prokaryotic chromosomes. • ‘functional coupling’ • Clustering of orthologous FASII-related genes (with corresponding colors to (b)) provided key evidence for the identification of two novel enzymes (missing genes) involved with SFA and UFA II pathways: fabK (11b) and fabM

Gene fusion • This technique involves searches for a pair of genes from one genome that appear to be fused into a single gene within another genome, providing further evidence of potential functional coupling. Since its introduction • the protein fusion approach has been implemented and successfully applied for genome-wide hypothetical protein analysis,

Metabolic Pathway I609: PhD Seminar Computational techniques in comparative genomics