Potential Drug Target Discovery on PPI Networks

Raheleh Salari SFU Potential Drug Target Discovery on PPI Networks

Pathogens becoming more drug resilient; infectious diseases on the rise. • Emerging diseases (e.g. avian flu) may result in a global pandemic! • Rational drug design - search for magic bullets is failing. • Combinatorial therapies needed – multiple drug targets.

Computational identification of drug targets • Protein protein interaction (PPI) networks: edges-interactions, nodes-proteins. • Goal: Identify protein targets on PPI networks whose “removal” disrupts several “essential” pathways/complexes and their possible “backup” paths on the PPI network. • Targets should have no human orthologs.

Example Associated PPI subnetwork H.Pylori Chemotaxis pathway

PPI networks + pathways • Strategy: aim to disrupt all the possible communication paths between “endpoint” pairs of essential pathogenic pathways (multicut). • Weighted node sparsest cut: • Input: Node weights (large for human orthologs - small for essential proteins, surface proteins, easy targets), Essentiality of source/sink pairs (quantify how important a pathway is to survival) • Output: minimize W(C) / ecc(C) • W(C) = total weight of nodes on C • ecc(C) = total essentiality of the pathways disrupted

Approximation algorithms • DSC: # endpoint pairs = O(log n) O(log n) approximation by trivial generalization of multi-cut algorithms (Check every subset of source sink pairs) O(n3 log2 n) [Goldberg & Tarjan 88] • LP: # source/sink pairs unbounded O(n1/2 ) approximation polynomial rounding algorithm[Hajiaghayi & Raecke 07] • Identical results on H.pylori PPI network, slight differences on E.coli PPI network

Pathway Source Sink Target(s) Method(s) OmpR Family PhoR PhoA OmpR Family TorS TorA NarL Family NarG NarI NarL Family NarQ FrdA dnaK* DSC, LP DNA polymerase dnaE holA DNA polymerase dnaE holD ABC Transporter CysP CysA Bacterial Chemotaxis Tar MotA cheW* DSC, LP Bacterial Chemotaxis DPPA MotA mtlD DSC Input: E.coli Signaling Pathways

Complex Source Sink Target(s) Method(s) RNA Polymerase infB rpoN RNA Polymerase hepA greB rpoA*+, rpoB*+, rplC*, rpoC*, rpsB*, rpsE* DSC, LP ACP Ffh aidB IscS hscA fdhD lpdA*, IysU, aceF*, aceE, iscS*, rpsE* DSC, LP DNA polymerase sbcB priA Ribosome associated hlpA uvrC Ribosome associated cafA kdsA Input: E.coli essential complexes

Pathway Source Sink Target(s) Method(s) Ribosomal Proteins rplD rplP Ribosomal Proteins rplI rpsF HP1223 DSC, LP Protein Export SecD YidC Type III Secretion Sys. FliF FlhA Type IV Secretion Sys. cag12 trbI HP0933 DSC, LP Two component Sys. TrpB TrpE HP0452 DSC, LP Two component Sys. AtoS AtoB HP0149 DSC, LP Flagellar Assembly FliG FliN HP0823 DSC, LP Flagellar Assembly FliD FlhA FabE DSC, LP Bacterial Chemotaxis CheW MotB ABC Transporters OppA OppF msrAB DSC, LP DNA Polymerase dnaE dnaN HP0241 DSC, LP Input: H.pylori signaling pathways

PPI networks only • Strategy: aim to disrupt as many “potential” pathways as possible (balanced cut). • Minimum weighted node separator problem: C is a -balanced separator if C partitions V to V’ and V’’ s.t. min{|V’|,|V’’|} > .|V| • Input: Node weights (small node weights indicate essentiality, targetability etc., human orthologs have large weight) • Output: find C with minimum total weight

Approximation algorithms, heuristics • O(log n) approximation [Leighton & Rao 99] performs poorly in practice . • O(log1/2 n) approximation [Arora & Kale 07] is only slightly better. • Greedy heuristics targeting nodes with maximum degree (GDeg), betweenness (GBet) perform relatively poorly. • Heuristics motivated by several combinatorial observations devised (HMWS).

Comparison of HMWS, GDeg and GBet methods

1 Ribosome 2 Pyruvate metabolism 3 Butanoate metabolism 4 Citrate cycle (TCA cycle) 5 Glycolysis/Gluconeogenesis 6 Alanine and asparate metabolism 7 Glycine, serine and threonine metabolism 8 Valine, leucine and isoleucine degradation 9 Pyrimidine metabolism 10 Purine metabolism 11 RNA polymerase 12 Lysine biosynthesis 13 Aminoacyl-tRNA biosynthesis 14 Two Component (NarL family) * 15 Bacterial Chemotaxis * 16 ABC transporters (Iron complex) * E.Coli pathways disrupted (cut size 28, β=0.15)

Gene Name Drug rpoA Rifabutin rpoB Rifampin, Rifaximin rpsJ Nitrofurantoin rpsD Clomocycline, Demeclocycline, Doxycycline, Lymecycline, Minocycline, Oxytertracycline, Tetracycline, Tigecycline E.coli known drug targets (re)discovered(cut size 28 β=0.15)

1 Purine metabolism 2 Pyrimidine metabolism 3 RNA polymerase 4 Caprolactam degradation 5 Flagellar assembly 6 Urease complex 7 Ribosomal proteins * 8 Oxidative phosphorylation (F-type ATPase) * 9 Epithelial cell signaling in H. pylori infection * 10 DNA polymerases * 11 Bacterial chemotaxis * 12 Oxidative phosphorylation (f-type ATPase) * 13 Protein export (Sec dependent pathway) * 14 Two-component system – NtrC family * 15 ABC transporters(Iron complex) * 16 Flagellar assembly * 17 Tyep IV secretion system * H.Pylori disrupted pathways (cut size 17, β=0.15)

Acknowledgements • Cenk Sahinalp (SFU, CompBio) • Fereydoun Hormozdiari (SFU, CompBio) • Vineet Bafna (UCSD) • Phuong Dao (SFU, CopmBio) • SFU CTEF: Bioinformatics for combating infectious diseases program • NSERC, CRC program, MSFHR

HMWS • RWB: compute Random Walk Betweenness for all nodes – in O(n3) time on a sparse graph • Split: returns an initial cut s.t. every connected component < (1n nodes • Merge: partitions the components into two each with > n nodes • Cut: do it all over again

Potential Drug Target Discovery on PPI Networks