TORQUE: Topology-Free Querying of Protein Interaction Networks

TORQUE: Topology-Free Querying of Protein Interaction Networks Sharon Bruckner1, Falk Hüffner1 , Richard M. Karp2, Ron Shamir1, and Roded Sharan1 1 School of computer science, Tel Aviv University 2 Int. Computer Science Institute, Berkley, CA

Our goal: network querying • Start with a protein-protein interaction network of some species A. • We seek subnetworks that match complexes or pathways. • Network Querying: Given a protein complex from another species B, identify the subnetwork of A that is most similar to it. • Why network querying? • Match hints at an evolutionary conserved region • Infer the functionality of the matched region.

Previous Methods Assume knowledge of the interactions within the query complex (the topology). Look for a match in the network with the same topology. Examples: Qnet (Dost et al, 2008), GraphFind (Ferro et al, 2008). ? ?

No need for topology! ? Interaction information is noisy and incomplete, and for some species – not available.

The problem Input: • Graph G=(V,E) , |V|=n, |E|=m • Color set {1,2,...,k} • A coloring of network vertices

The problem We seek: Is there are connected subgraph of G that has exactly one vertex of each color? Call such a subgraph “colorful”

ABOUT THE PROBLEM • NP-complete • Hard even when the graph is a tree with max degree 3 (via reduction from 3SAT (Fellows et al, 2007) • Our Contributions: • A fixed parameter dynamic programming algorithm. • Integer Linear Program • Fast heuristics • Implementation using a combination of the above.

DEFINING THE BASIC DP ALGORITHM Every connected subgraph has a spanning tree Every colorful connected subgraph will have a colorful spanning tree Instead of looking for a colorful subgraph, look for a colorful tree Input: A graph where each vertex is colored by one of k colors. Output: Find the highest scoring colorful tree Input: A graph where each vertex is colored by one of k colors. Output: Find a colorful tree

Dynamic Programming Algorithm (Fellows et al, 2008) IDEA: Instead of looking at all nk possible subgraphs, look only at all 2k color sets • Row for each vertex • Column for each subset of colors, in increasing size. Score of best tree Rooted in v3 that Is colored exactly By S3 vertices

Dynamic Programming Algorithm • The last column contains, for every vertex v, the highest scoring tree • rooted in v • colored by all the colors of the query! • Running time: O(3k|E|).

example T(v, { } ) w u u v v

Extension 1: Allowing deletions – matching with less colors ?

Extension 2: Allowing Insertions: Special non-colored vertices,arbitrary vertices

Allowing non-colored insertions • For j insertions, we would expect running time: O(3k+jm). • Can show: O(3kmj). • Make j copies of each column, and recursively solve: B(v, S, j’) = Highest score of a tree, rooted in v, colored by S, using exactly j’ insertions

Formula & Example b f a c e d g Running Time: O(3km*j)

Extension 3: ALLOWING MULTIPLE COLORS PER VERTEX

Putting it together… 8 2.34 1.25 6.6 3 ? 4.57 3.14 4.8 3 0.3 2.25 0.25 3.9 0.82 1.25

A second approach • Formulate the problem as an integer linear program (ILP). • Use efficient ILP solvers.

ILP at a glance Want: Subset T of the vertices Formulate colorfulness Only vertices in T are colored. Every vertex should get at most one color Every color should be given to at most one vertex Formulate connectivity Find a flow such that: Only vertices in T can be involved in the flow. Flow of k-1, single sink, k-1 sources Every source has connection to the sink via flow edges.

The Integer Linear Program

Heuristic Speedups First do data reduction only 5% of the vertices are associated with one or more query colors many non-colored vertices are too far from any colored vertex to be useful For each remaining connected component: Try a shortest-paths based heuristic that does not allow mismatches. If this fails: If few colors, but large instance, use dynamic programming Otherwise, use ILP

Implementation, Experiments & Results

Experiments • We applied our method to query complexes within: • yeast (5430 proteins, 39936 interactions), • fly (6650 proteins, 21275 interactions) • human (7915 proteins, 28972 interactions). • Queries: • yeast, fly, human • bovine, mouse, and rat.

Comparison with other methods • Most previous work tested queries with a known topology. ? • We compare our results with those of Qnet (Dost et al, 2008), designed to tackle topology-based queries. • QNet uses color coding to tackle the subgraph homemorphism problem, allowing insertions and deletions.

Comparison with QNet

Results Evaluation Functional coherence Used GO TermFinder for functional enrichment in T. Specificity Looked at overlap between T and known complexes in the target species. Compared to overlap between random subgraphs and the known complexes. Corrected for multiple testing using FDR (q<0.05). Quality match: Functionally coherent and specific.

Selected results

Summary The PPI network querying problem motivates the colorful connected subgraph problem. A fixed parameter dynamic programming algorithm, allowing insertions, deletions, and multiple colors per vertex, along with an ILP formulation and heuristics, obtains good results. Thanks: Nir Yosef, the TAU Computational Genomics group , and the Computational System Biology group. Israel Science Foundation, Edmond J. Safra Bioinformatics Program, Tel Aviv Univ.

References • [FFHV07] M. R. Fellows, G. Fertin, D. Hermelin, and S. Vialette. Borderlines for finding connected motifs in vertex-colored graphs. In Proc. ICALP’07, volume 4596, pages 340–351. Springer-Verlag, 2007. • [N06] R. Niedermeier. Invitation to Fixed-Parameter Algorithms. Number 31 in Oxford Lecture Series in Mathematics and Its Applications. Oxford University Press, 2006. • [BFKN08] N. Betzler, M. R. Fellows, C. Komusiewicz, and R. Niedermeier. Parameterized algorithms and hardness results for some graph motif problems. In Proc. 19th CPM, volume 5029 of LNCS, pages 31{43. Springer, 2008. • [AYZ95] N. Alon, R. Yuster, and U. Zwick. Color coding. Journal of the ACM, 42: 844{856, 1995}. • [DSGRBS08] B. Dost, T. Shlomi, N. Gupta, E. Ruppin, V. Bafna, and R.Sharan. Qnet: A tool for querying protein interaction networks. Journal of Computational Biology, 15(7):913-925, 2008.

TORQUE: Topology-Free Querying of Protein Interaction Networks