230 likes | 351 Vues
This document explores the significance of pathway databases, emphasizing the distinction between targeting processes and molecules in biological research. It discusses the importance of hyperactive processes, such as mitosis, over mere protein expression levels. The utility of structured pathway frameworks for simulation and analysis is highlighted, advocating for composable models that address knowledge gaps. Additionally, approaches to identifying and classifying molecules and their interactions are presented, showcasing how data from various sources, like KEGG and BioCarta, can be integrated for enhanced understanding of biological systems.
E N D
CGAP/CMAPPathway Database Carl Schaefer February 26, 2003
Why Spend Effort on Pathways? • Target as process vs. target as molecule • In the end, what matters is a hyperactive process (e.g. mitosis), not just an over-expressed protein • Phenotype classification • Higher-level feature than transcript abundance
Why Spend Effort on a Pathway Database? • A picture may be worth a thousand words ... • but a computable representation is even better • Make assumptions explicit • Combine sources of data • KEGG, BioCarta, ... • Merge data from separate pathways • E.g. BioCarta’s “Cyclins and Cell Cycle Regulation” and “Cyclin E Destruction Pathway” • Causal framework for quantitative simulation/analysis • ... when the data becomes available
Basics • Model a causal network • Be composable (novel pathways) • Cope with lack of knowledge • Promote understanding
Model A Causal Network • Graph (nodes & edges) • Distinguish two kinds of nodes (molecules & processes) • Allow labels on nodes and edges • molecule-type (compound, protein, complex, rna) • molecule-id (...) • process-type (reaction, binding, modification, translocation, transcription, cell process) • edge-type (input, output, agent, inhibitor) • activity-state (active, inactive) • location (extracellular, transmembrane, cytoplasm, nucleus) • reversible (yes, no)
Composable • “Atomic pathway” • a process node • immediately adjacent molecules • the connecting edges • Join atomic pathways on identical molecules • ... and maybe on molecule subtype relation
Digression on Identifying Molecules • p16 and p53 are clearly different, but ... • How about NP_000068 and NP_478103 (variants of p16)? • How about AKT inactive and AKT active? • How about C5, C5a and C5b? • How about p53 in cytoplasm and p53 in nucleus? • What if you know ... • there exist two different things, but • you don’t know which one participates in the interaction
Identifying Molecules: Uneasy Compromise • Can distinguish molecules by • basic molecule-id • instance-specific labels (location, activity-state, ...) [like states] • Same molecule-ids but different instance-specific labels: • location • modifications like phosphorlyation • Different molecule-ids: • splice variants • modifications like C5 C5a, C5b • molecule-id families and unspecified label values allow for deliberate ambiguity
Identifying Molecules: Complexes • Two complexes have the same molecule-id only if their components are identical (in molecule-id and other labels) • makes the computation for joins easier, but ... • obscures relationships • ksr:mek:erk completely distinct from ksr:mek+:erk+ • Unresolved: showing relations within a complex • Within tnf:tnfr:fad, tnf binds to tnfr
Lack of Knowledge • Hierarchy of label values • e.g., edge-type incoming-edge agent • Hierarchy of molecule ids • GO id • Gene product • Specific protein • Families of molecules • “Handbook” • E.g.: “for Raf-1, ‘active-1’ means phosphorylation at S259”
Promote Understanding • Hide unwanted detail • prune common molecules • encapsulate sub-pathways • Query by connectedness (cause & effect) • Find patterns
Query by Connectedness:Predecessors/Successors atom-id = 411 direction = forward degree = 3 prune common compounds
Patterns • Templates for atomic pathways: process-type=modification:: molecule-type=protein[1]:edge-type=agent:: molecule-type=protein[2]:edge-type=input:activity-state=inactive:: molecule-type=protein[2]:edge-type=output:activity-state=active • Maybe multi-process templates (e.g., a cascade)
What Do We Need? • Computation model of pathway interactions • Persistent data model • Tools: • data input • query and analysis • visualization • Data, data, data, ...
What Do We Have? • Computation model: mostly worked out • Persistent data model: mostly worked out • Tools: • working on data input • have a query/analysis tool • joins, prunes, finds predecessors/successors • produces graph output • extracts first-order patterns • using GraphViz to produce SVG diagrams • Data, data, data ... • Loaded KEGG into database • Next: ~30 BioCarta pathways related to apoptosis, cell-cycle regulation and histone deacetylase activity
( reaction ( atom-id "411" ) ( reversible "yes" ) ( agent ( edge-seq-id "1" ) ( protein ( molecule-id "8423" ) ( LL "2194" ) ( EC "2.3.1.85" ) ( AS “FASN" ) ) ) ( input ( edge-seq-id "2" ) ( compound ( molecule-id "4872" ) ( KG "C05746" ) ( AS "3-oxohexanoyl-[acp]" ) ) ) ( output ( edge-seq-id "4" ) ( compound ( molecule-id "4873" ) ( KG "C05747" ) ( AS "d-3-hydroxyhexanoyl-[acp]" ) ) ) ) ( reaction ( atom-id "412" ) ( reversible "yes" ) ( agent ( edge-seq-id "1" ) ( protein ( molecule-id "8423" ) ( LL "2194" ) ( EC "2.3.1.85" ) ( AS "FASN" ) ) ) ( input ( edge-seq-id "2" ) ( compound ( molecule-id "4873" ) ( KG "C05747" ) ( AS "d-3-hydroxyhexanoyl-[acp]" ) ) ) ( output ( edge-seq-id "3" ) ( compound ( molecule-id "4874" ) ( KG "C05748" ) ( AS "trans-hex-2-enoyl-[acp]" ) ) ) )
digraph G { 1 [shape="box", height="0.2", width="0.2", fontsize="10", style="filled", color="black", label=""]; 2 -> 1 [color="green" ]; 2 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="EC:2.3.1.85"]; 3 -> 1 [color="black" ]; 3 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="3-oxohexanoyl-[acyl-carrier protein]"]; 1 -> 4 [color="black" ]; 4 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="d-3-hydroxyhexanoyl-[acyl-carrier protein]"]; 5 [shape="box", height="0.2", width="0.2", fontsize="10", style="filled", color="black", label=""]; 2 -> 5 [color="green" ]; 4 -> 5 [color="black" ]; 5 -> 6 [color="black" ]; 6 [shape="plaintext", height="", width="", fontsize="14", color="black", style="", label="trans-hex-2-enoyl-[acp]"]; }