Path finding in metabolic graphs

Path finding in metabolic graphs Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

Graph-based analysis of metabolic networks Traversal rules for metabolic graphs Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

Traversal rules • How to treat • Reaction reversibility ? • Pool metabolites ?

Direct traversal of reversible reactions Reaction L-Aspartic Semialdehyde dihydrodipicolinic acid 4.2.1.52 Pyruvate H2O Valid pathways 4.2.1.52 L-Aspartic Semialdehyde dihydrodipicolinic acid 4.2.1.52 dihydrodipicolinic acid L-Aspartic Semialdehyde Invalid pathway L-Aspartic Semialdehyde 4.2.1.52 Pyruvate

Mutual exclusion of reverse reactions Reactions L-Aspartic Semialdehyde dihydrodipicolinic acid 4.2.1.52 Pyruvate H2O dihydrodipicolinic acid L-Aspartic Semialdehyde 4.2.1.52 reverse H2O Pyruvate Invalid pathway 4.2.1.52 reverse L-Aspartic Semialdehyde dihydrodipicolinic acid 4.2.1.52 Pyruvate

Ubiquitous compounds Reactions L-Aspartic Semialdehyde dihydrodipicolinic acid 4.2.1.52 Pyruvate H2O Sucinyl diaminopimelate succinate 3.5.1.18 H2O LL-diaminopimelic acid Invalid pathway L-Aspartic Semialdehyde LL-diaminopimelic acid 4.2.1.52 3.5.1.18 H2O

Ubiquitous compounds • Jeong et al. (2000) • Calculate network diameter, i.e. average length of shortest path between two compounds • Show that when ubiquitous compounds ("hubs" in their terminology) are removed, diameter increases. • Compared the metabolic network diameter between different organisms. • “Surprising” result: the network diameter does not depend on the number of enzymes found in the organism. • But: for this comparison, all compounds were considered as valid intermediates, including H2O, NADP, ATP, H+, …. Jeong, H., B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000. The large-scale organization of metabolic networks. Nature407: 651-654.

Traversal of reversible reactions • Fell& Wagner (Nat Biotechnol2000; 18, 1121-2) • Select a sub-network (energy metabolism and small molecule biosynthesis in E.coli). • Discard ubiquitous compounds. • Identify the "center" of the network : glutamate, followed by pyruvate. • But: reactions can be traversed from substrate to substrate or from product to product. • Jeong et al. (Nature2000; 407: 651-654) • Calculate network diameter. • But: reactions can be traversed from substrate to substrate or from product to product.

Graph-based analysis of biochemical networks Path finding Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

Applications of path finding to biochemical networks • Two-ends path finding • Inferring metabolic pathways allowing to convert A to B • Measuring functional distance between enzyme pairs • minimal number of steps between the reactions they catalyze • Single-end path finding • Signal transduction: • Starting from a membrane receptor, detect potential target genes (downstream path finding). • Starting from a transcription factor, find potential regulatory pathways (upstream path finding).

A graph of compounds and reactions Reactions from KEGG • Compound nodes • 10,166 compounds(only 4302 used by one reaction) • Reaction nodes • 5,283 reactions • Arcs • 10,685 substrate  reaction (7,297 non-trivial) • 10,621 reaction  product(6,828 non-trivial)

Metabolic Pathways as subgraphs Escherichia coli 4219 Genes (Blattner) 967 enzymes (Swissprot) 159 pathways (EcoCyc)

Graph-based analysis of biochemical networks Inferring metabolic pathways by path finding Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

Pathway enumeration • Kuffner et al. (Bioinformatics 2000; 16: 825-836). • All possible paths from glucose to pyruvate, with maximal length 9  500,000 possible paths. • Adding constraints • Selecting "complete" pathways, i.e. where all side reactants are ubiquitous • Constraint on pathway width • Width 2  541 pathways • Width 1  170 pathways source compound target compound reactions compounds potential metabolic pathways path finding

Test case: methionine biosynthesis L-Aspartate 2.7.2.4 S.cerevisiae E.coli L-aspartyl-4-P 1.2.1.11 L-aspartic semialdehyde 1.1.1.3 L-Homoserine 2.3.1.31 2.3.1.46 Alpha-succinyl-L-Homoserine O-acetyl-homoserine 4.2.99.9 Cystathionine 4.2.99.10 4.4.1.8 Homocysteine 2.1.1.14 L-Methionine 2.5.1.6 S-Adenosyl-L-Methionine

Raw graph: from L-aspartate to L-methionine • The 5 shortest paths from L-aspartate to L-methionine in the raw graph • L-aspartic acid --> 6.3.5.4 -->AMP--> 6.1.1.10 --> L-methionine • L-aspartic acid --> 3.5.1.15 -->H2O--> 3.4.13.12 --> L-methionine • L-aspartic acid --> 3.5.1.15 -->H2O--> 3.4.13.12 --> L-methionine • L-aspartic acid --> 4.3.1.1 -->NH3--> 4.4.1.11 --> L-methionine • L-aspartic acid --> 3.5.1.15 -->H2O--> 3.5.1.31 --> L-methionine • All these paths convert L-aspartate to L-methionine in 2 reactions steps. • In all these cases, the intermediate compound belongs to the group of highly connected nodes in the metabolic graph. • These compounds cannot be considered as valid intermediates between these reactions.

Filtered graph: discarding pool metabolites • To avoid irrelevant shortcuts, a set of highly connected compounds are discarded from the graph. • The selection is fine-tuned manually • some compounds are maintained (e.g. S–Adenosyl–L–methionine, …). • others, although less connected, are removed (e.g. pyruvate, CMP). Filtered out H20 ATP NAD NADH NADPH NADP O2 ADP Pi CoA CO2 Ppi NH3 UDP AMP pyruvate acetyl-CoA L-glutamate 2-oxoglutarate H2O2 Acceptor UDP Reduced acceptor Acetate GDP oxalacetic acid succinic acid GTP CMP UTP H+ UMP CDP reduced ferredoxin H2 FADH2

Filtered graph : choice of excluded compounds • Where to set the limit ? • Seems obvious for H2O (1615), NADH (569), ... • What about ATP (435) ? • And pyruvate ? • And NH3 ? • Depends on the reaction/pathway considered • e.g. ATP is valid intermediate in nucleotide biosynthesis • Depends on the atoms being transferred during the reaction • e.g. NADH gives one proton • Depends on the focus of the question • e.g. analysis of energy metabolism ATP, NAD will matter

Filtered graph: from L-aspartate to L-methionine The 5 shortest paths from L-aspartate to L-methionine in the filtered graph • L-aspartic acid --> 2.6.1.35 --> glycine --> 2.6.1.73 --> L-methionine • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine --> 2.6.1.73 --> L-methionine • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.41 --> d-methionine --> 5.1.1.2 --> L-methionine • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.2 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine • L-aspartic acid --> 4.1.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine --> 2.6.1.73 --> L-methionine • These paths use valid intermediate compounds. • However, they are much shorter (2 or 3 intermediate reactions) than the annotated methionine pathway. • The intermediate compounds and reactions are not part of the annotated pathway.

Path finding in a weighted graph • Principle • Each compound node is assigned a weight proportional to its connectivity degree. • All compounds are allowed for path finding, but the cost is higher for highly connected compounds. • This reduces the probability to use a pool metabolite as intermediate between two successive reactions.

Weighted graph: methionine biosynthesis • Search of the 5 shortest paths from L-aspartate to L-methionine • Weighted graph (compound weight = connectivity • L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine--> 2.5.1.49--> L-methionine • L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine--> 2.5.1.49--> L-methionine • L-aspartic acid--> 3.5.5.4 --> L-beta-cyanoalanine --> R03972 --> L-2,4-diaminobutyrate --> 2.6.1.46--> L-aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine • L-aspartic acid--> 3.5.5.4 --> L-beta-cyanoalanine --> R03972 --> L-2,4-diaminobutyrate --> 2.6.1.46--> L-aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine--> 2.5.1.49--> L-methionine • L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.46 --> o-succinyl-L-homoserine --> 2.5.1.48 --> L-cystathionine --> 2.5.1.49 --> o-acetyl-L-homoserine--> 2.5.1.49 --> L-methionine E.coli pathway Yeast pathway

Heme biosynthesis (Saccharomyces cerevisiae) Path finding in weighted graph Annotated pathway Path finding in filtered graph Path finding in raw graph D 2.3.1.37 2.3.1.37 2.3.1.37 B 5-aminolevulinate co2 5-aminolevulinate 2-amino-3- oxoadipate 5-aminolevulinate 2-amino-3- 5-aminolevulinate co2 oxoadipate 4.2.1.24 4.2.1.24 porphobilinogen 2.6.1.43 2.3.1.37 porphobilinogen 2.3.1.37 2.5.1.61 2.3.1.37 l-alanine succinyl-coa 2.5.1.61 CO2 hydroxymethylbylane hydroxymethylbilane 2-amino-3-oxoadipate 2.6.1.44 1.2.7.3 1.1.1.170 1.14.12.1 6.4.1.- 4.2.1.104 1.14.12.1 1.1.1.270 1.14.13.72 4.2.1.75 2.3.1.37 4.2.1.75 gly oxidized ferredoxin H+ uroporphyrinogen iii uroporphyrinogen iii succinyl-coa 1.3.7.2 1.3.7.5 4.99.1.1 2.3.1.37 1.2.7.3 4.1.1.37 4.1.1.37 1.4.2.1 biliverdin coproporphyrinogen iii coproporphyrinogen iii oxidized ferredoxin protoporphyrin haem fe2+ ferrocytochrome c 1.14.99.3 1.3.7.2 1.3.3.3 1.3.7.5 1.3.7.4 1.3.3.3 protoporphyrinogen ix protoporphyrinogen ix biliverdin 1.9.99.1 haem 1.14.99.3 1.3.3.4 1.3.3.4 4.99.1.1 protoproporphyrin protoporphyrin haem fe2+ h+ 4.99.1.1 4.99.1.1 protoporphyrin fe2+ h+ Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2006. J Mol Biol 356: 222-236.

Alignment between inferred and annotated pathways Threonine biosynthesis

Evaluation of inferred paths (KEGG/LIGAND network, aMAZE pathways) True Positive: Inferred and annotated False Positive: inferred not annotated False Negative: annotated not inferred True Negative: not inferred not annotated • Comparison between inferred paths and annotated pathways based on intermediate reactions (those not provided as source and target) Sensitivity Sn = TP/(TP + FN) Positive predictive value (specificity) PPV = TP/(TP+FP) Accuracy Acc = (Sn+PPV)/2 Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2006. J Mol Biol 356: 222-236.

Evaluation of inferred paths (EcoCyc network, EcoCyc pathways) True Positive: Inferred and annotated False Positive: inferred not annotated False Negative: annotated not inferred True Negative: not inferred not annotated • Comparison between inferred paths and annotated pathways based on intermediate reactions (those not provided as source and target) Sensitivity Sn = TP/(TP + FN) Positive predictive value (specificity) PPV = TP/(TP+FP) Accuracy Acc = (Sn+PPV)/2 Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2006. J Mol Biol 356: 222-236.

Inferred paths versus KEGG/LIGAND pathway maps • Each inferred path is compared to the 85 pathway maps, and the significant correspondences are retained (hypergeometric test). • X axis • number of intermediate reactions in the inferred path • Y axis • number of reaction in common with a KEGG pathway • Values • number of inferred paths • On the diagonal • inferred paths completely included in one KEGG pathway. • Inferred length • Raw graph < Filtered graph < Weighted graph • Consistency with KEGG • Raw graph < Filtered graph < Weighted graph

Graph-based analysis of biochemical networks Distances between enzyme pairs in the metabolic graph Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

Functional distance between enzymes • The length of the shortest path between two reactions can be considered as a measure of their functional distance. • By extension, one can estimate the functional distance between two enzymes as the length of the shortest path between the catalyzed reactions. • Example of application: interpretation of pairs of fused genes • Two enzymatic functions can be carried by a single gene in a genome, and by two separated genes in another genomes, as the result of a gene fusion event • Are such fusion events preferentially observed between functionally related enzymes ?

Shortest path finding with gene fusion pairs • Data source for fusion pairs : • Tsoka and Ouzounis (Nat Genet2000; 26: 141-2) • Calculation of metabolic distances between fused pairs : • van Helden et al. (2002) In Bioinformatics and Genome Analysis. Springer-Verlag, Berlin Heidelberg, Vol. 38. enzyme A enzyme B reactions compounds functional distancebetween enzymes shortest path finding Fusion pairs Random pairs

Is the metabolic world so small ? Pairs of reactions in the same metabolic pathway Random pairs of reactions Number of paths Metabolic distance (number of steps) Pairs of reactions in the same metabolic pathway Random pairs of reactions Number of paths Metabolic distance (number of steps) Pairs of reactions in the same metabolic pathway Random pairs of reactions Number of paths Metabolic distance (number of steps) • Distributions of metabolic distances (number of reactions) • Between pairs of reactions belonging to the same metabolic pathway. • Between randomly chosen pairs or reactions. A Raw graph B Filtered graph C Weighted graph Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2006. J Mol Biol 356: 222-236.

Enzymes from the same pathway versus random pairs Weighted distance Number of steps Didier Croes (2005). PhD thesis.

Enzyme pairs within the same operon Didier Croes (2005). PhD thesis.

Enzymes involved in gene fusion events Didier Croes (2005). PhD thesis.

Enzyme pairs from two-hybrid data (DIP) • Analyse: interactions entre enzymes Didier Croes (2005). PhD thesis.

Enzyme pairs from complexes detected by mass spectrometry (Tap-Tag: Gavin et al.,2002) • Analyse: interactions entre enzymes Didier Croes (2005). PhD thesis.

Enzyme pairs from complexes detected by mass spectrometry (HMS-PCI: Ho et coll., 2002) • Distances métaboliques entre enzymes Didier Croes (2005). PhD thesis.

Path finding - summary • Raw graph • Shortest path is generally irrelevant. • Simple path enumeration returns innumerable false positives. • Filtered graph (pool metabolites discarded) • Improves the relevance of inferred paths. • However, the inferred gap is still restricted (2-3 intermediate steps): for larger gaps, shortcuts are returned. • Weighted graph (compound connectivity degree) • Improves the relevance of inferred paths. • Allows to detect alternative pathways. • Allows to infer paths over remarkably large distances (several cases with 6-7 intermediate reactions).

Restrictions and perspectives • Weighting criterion (compound connectivity) is too simplistic. • Indeed, it could easily be improved by assigning a weight to enzymes as well • Organism-specific : degree of evidence for the presence/absence of enzymes (biochemical characterization, by similarity, …). • Condition specific : level of expression and/or regulation of enzyme-coding genes. • This approach does not rely on any biochemical property of the molecules. • Indeed, however, in practice, it already gives potentially interesting results. • Biochemical criteria could be integrated in the graph representation • « Intra-reaction » pathways of functional groups of atoms. • Energy costs of reactions. • The two-ends path finding requires to specify the two ends of a pathway, which are generally not known (and are somewhat arbitrarily defined) • We are working on the extension to build paths from multiple seeds (same approach as in van Helden et al., 2002, but with the weighted graph). • Gene expression data (Karoline Faust) • Operons (Rekin’s Janky) • Accuracy values reflect the capability to recover the pathways annotated in the DB, but we still have no idea about how well the approach can be extended to discover new pathways.

Path finding in metabolic graphs

Path finding in metabolic graphs

Presentation Transcript

Cutting Images: Graphs and Boundary Finding

PAnG – Finding Patterns in Annotation Graphs

Path finding Framework using HRR

Shortest Path Tree Computation in Dynamic Graphs

Finding the Minimum-Weight -path

Graphs – Shortest Path (Weighted Graph)

High-Performance Computation for Path Problems in Graphs

Cutting Images: Graphs and Boundary Finding

Graphs and Finding your way in the wilderness

Path Finding Committee Recommendation

Improved Randomized Algorithms for Path Problems in Graphs

Finding Slope from Graphs and Tables

Finding Cross Genome Patterns in Annotation Graphs

STCON in Directed Unique-Path Graphs

Finding a Maximum Matching in Non-Bipartite Graphs

Cutting Images: Graphs and Boundary Finding

Game AI Path Finding

Detecting active subnetworks in metabolic interaction graphs with missing data

Game AI Path Finding

Finding Your Spiritual Path

Graphs and Finding your way in the wilderness