1 / 25

TORQUE: Topology-Free Querying of Protein Interaction Networks

TORQUE: Topology-Free Querying of Protein Interaction Networks. Sharon Bruckner 1 , Falk Hüffner 1 , Richard M. Karp 2 , Ron Shamir 1 , and Roded Sharan 1 1 School of computer science, Tel Aviv University 2 Int. Computer Science Institute, Berkley, CA. To appear in RECOMB 09.

dior
Télécharger la présentation

TORQUE: Topology-Free Querying of Protein Interaction Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TORQUE: Topology-Free Querying of Protein Interaction Networks Sharon Bruckner1, Falk Hüffner1 , Richard M. Karp2, Ron Shamir1, and Roded Sharan1 1 School of computer science, Tel Aviv University 2 Int. Computer Science Institute, Berkley, CA To appear in RECOMB 09

  2. The problem Input: • Graph G=(V,E) , |V|=n, |E|=m • Color set C={1,2,...,k} • A function c: VC assigning v the color c(v).

  3. The problem We seek: Is there are connected subgraph of G that has exactly one vertex of each color? Call such a subgraph “colorful”

  4. But why? • Our graph = A protein-protein interaction network of some species. • Our colors = set of proteins from another species that constitute a complex. Each network vertex is given the color of the protein in that set most similar to it.

  5. But why? • Our graph = A protein-protein interaction network of some species. • Our colors = set of proteins from another species that constitute a complex. Each network vertex is given the color of the protein in that set most similar to it. • What is the meaning of a match? • Hints at an evolutionary conserved region • May infer the functionality of the matched subgraph from that of the complex.

  6. ABOUT THE PROBLEM • NP-complete • Hard even when the graph is a tree with max degree 3 (by reduction from 3SAT ([FFHV07]) • But! We know the number of colors k • is relatively small. • Solution: A fixed parameter algorithm! A problem is fixed-parameter tractable with respect to a parameter k if an instance of size n can be solved in time where f is an arbitrary function (see e.g. [N06])

  7. Defining The Basic algorithm Every connected subgraph has a spanning tree Every colorful connected subgraph will have a colorful spanning tree Instead of looking for a colorful subgraph, look for a colorful tree Input: A graph where each vertex is colored by one of k colors. Output: Is there a colorful tree? Input: A graph where each vertex is colored by one of k colors. Output: What is the highest scoring colorful tree?

  8. Dynamic Programming Algorithm IDEA: Instead of looking at all nk possible subgraphs, look only at all 2k color sets • Row for each vertex • Column for each subset of colors, in increasing size. Score of best tree Rooted in v3 that Is colored exactly By S3 Table verts

  9. Dynamic Programming Algorithm • The last column contains, for every vertex v, the highest scoring tree • rooted in v • colored by all the colors of the query! • Running time: O(3km).

  10. example B(v, { } ) w u u v v

  11. Allowing deletions – matching with less colors ?

  12. Allowing deletions – matching with less colors • Simply look at all columns with color sets of size at least k - num_dels

  13. Allowing Insertions: Special non-colored vertices or arbitrary vertices

  14. Allowing non-colored insertions • For j insertions, we would expect: • Running time: O(3k+jm). • Actually, • Running time: O(3kmj). • Simply make j copies of each column, and answer the question: B(v, S, j’) = What is the highest scoring tree, rooted in v, colored by S, using exactly j’ insertions?

  15. Formula & Example b f a c e d g Running Time: O(3km*ins)

  16. Details • For every vertex v, color subset S, the algorithm will accurately find the best tree of those having the minimal number of insertions. • Once B(v,S,j) < ∞ for some j, the value for j+i will never be computed! • Cannot guarantee that B(v,S,j+i) will have exactly j+i insertions. v u

  17. Allowing multiple colors per vertex – use color-coding

  18. Implementation, Experiments & Results

  19. Experiments • We applied our method to query complexes within: • yeast (5430 proteins, 39936 interactions), • fly (6650 proteins, 21275 interactions) • human (7915 proteins, 28972 interactions). • Queries: • yeast, fly, human • bovine, mouse, and rat.

  20. Implementation comments We color the graph according to the similarity between the network and query proteins. In practice, in some problem instances the number of colors was not significantly smaller than the graph size This is a result of data reduction in the cases where many network vertices were not sufficiently similar to any query vertex. Therefore, the dynamic programming algorithm is supplemented by an ILP algorithm and some heuristics to handle these instances!

  21. Comparison with other methods • Most previous work tested queries with a known topology. ? • We compare our results with those of QNet ([DSGRBS08] ) , designed to tackle topology-based queries. • QNet is also based on dynamic programming and color coding .

  22. Selected results

  23. Summary The colorful connected subgraph problem is motivated by the PPI network querying problem. A fixed parameter dynamic programming algorithm, allowing insertions, deletions, and multiple colors per vertex, along with an ILP formulation and heuristics, obtains good results. Thanks: The ACGT group (Igor, Ofer, Chaim, Seagull, Guy…), Nir Yosef. Israel Science Foundation, Edmond J. Safra Bioinformatics Program, Tel Aviv Univ.

  24. References • [FFHV07] M. R. Fellows, G. Fertin, D. Hermelin, and S. Vialette. Borderlines for finding connected motifs in vertex-colored graphs. In Proc. ICALP’07, volume 4596, pages 340–351. Springer-Verlag, 2007. • [N06] R. Niedermeier. Invitation to Fixed-Parameter Algorithms. Number 31 in Oxford Lecture Series in Mathematics and Its Applications. Oxford University Press, 2006. • [BFKN08] N. Betzler, M. R. Fellows, C. Komusiewicz, and R. Niedermeier. Parameterized algorithms and hardness results for some graph motif problems. In Proc. 19th CPM, volume 5029 of LNCS, pages 31{43. Springer, 2008. • [AYZ95] N. Alon, R. Yuster, and U. Zwick. Color coding. Journal of the ACM, 42: 844{856, 1995}. • [DSGRBS08] B. Dost, T. Shlomi, N. Gupta, E. Ruppin, V. Bafna, and R.Sharan. Qnet: A tool for querying protein interaction networks. Journal of Computational Biology, 15(7):913{925, 2008.

More Related