400 likes | 531 Vues
This study presents a novel framework for predicting missing protein-protein interactions within signaling pathways using network-based techniques. We propose various algorithms, including a greedy approach and Dijkstra's algorithm, to identify edges that minimize total shortest distances between source and target proteins. The effectiveness of our methods is evaluated through their ability to reduce the cost function and increase prediction accuracy. Future applications include enhancing communication flows in biological networks and optimizing routing in complex systems.
E N D
A Network-based Approach for Predicting Missing Pathway Interactions AnkushBansal 658347261
Outline • Protein-Protein Interactions and Signaling pathway • Problem statement and variations of problem • Shortest distance algorithm • Greedy approach to predict missing edges • Other approaches • Results • Future Applications
Signaling Pathway • Sub-networks of proteins that communicate via a series of interactions • Contains upstream proteins • Source proteins transmit information to a set of target proteins
Problem Statement Searching for missing edges that will maximally decrease the shortest-path distances between sources & targets
1. Shortcuts Given a weighted digraph G = (V, E) and set of sources s V and t V, finding k edges that will minimize the total shortest distance between all source-target pairs.
2. Shortcuts-X (restricted) Given a weighted digraph G = (V, E) and set of sources s V and t V and maximum allowable hops are r, then finding k edges that will minimize the total shortest distance between all source-target pairs.
3. Shortcut-SS (Single Source) Given a weighted digraph G = (V, E) and set of sources s V and t V, then finding k edges that will minimize the total shortest distance between each target and its single closet source.
4. Shortcuts-X-SS (Restricted, Single Source) Given a weighted digraph G = (V, E) and set of sources s V and t V and maximum allowable hops are r, then finding k edges that will minimize the total shortest distance between each target and its single closet source.
Dijkstra’s Algorithm • Greedy algorithm to solve single source shortest-path problem • Doesn’t work for non-negative weights • Time complexity O(|E|+|V|log|V|), where |E| is # of edges and |V| is # of vertices
Bellman-Ford Algorithm • Another algorithm to find shortest-distance from one source node • Can handle negative weight in the graph • Time complexity is O(|V|.|E|), where | V | and | E | are the number of vertices and edges respectively
Greedy Algorithm to predict pathway-consistent edges • Selects k edges to add iteratively in each step that maximally reduces the cost function • # of non-existent edges are n(n-1) – m, where n and m are # of nodes and directed edges • Require recalculation of the shortest path length to each source to each target just to add a 1 edge
Greedy Algorithm to predict pathway-consistent edges (cont’d) • Trick is to pre-compute the shortest-path distance from every source to every other node, and from every node to every target • Then, check following condition: dprev(s,u) + d(u,v) + dprev(v,t) < dprev(s,t)
Greedy Algorithm to predict pathway-consistent edges (cont’d)
Complexity Improved from O(n2)O(|E|+|V|log|V|) for each step to O(n2)O(1) + (O(|E|+|V|log|V|) for each step
Hop-Restricted Greedy Algorithm • Used modified version of Bellman-ford algorithm that calculate shortest path using atmost r edges • r is generally 5 edges between a target and its closet source • Time complexity is O(n2)O(1) + O(r|E|) for each step
Other Algorithms to predict missing interaction • Direct-ST • predicts direct edges from sources to targets • reduces cost function maximally • Betweenness • Predict highly “central” to the sources and targets • Number of all-pair shortest paths that use the edge is “betweenness centrality” • Consider only source-target pairs
Other Algorithms to predict missing interaction (cont’d) • Jaccard • Add an edge between the two proteins with the highest weighted Jaccard coefficient J(u,v) =
Results (Criteria for evaluation) • Ability to reduce the cost function • Ability to predict edges that lie within the STRING potential edges • Ability to predict edges that lie within the STRING potential edges and HOG-related nodes
Summary • A new framework for predicting missing edges that lie “in-between” given sets of sources and targets • Greedy Algorithm substantially reduced source-target distance between by adding only few edges • Shortcut edges formed alternate path for signal flow which provides greater degree of robustness in the pathway
Summary • For Shortcuts, adding 3 edges reduced distance for 27 out of 55 • Similarly, for Shortcuts-X, adding 3 edges reduced distance for 18 pairs • Hop-restricted objectives tend to select central nodes through with much signal flows
Future Applications • Reducing routing lags or increasing information flow between entities in a network • This pathway-specific context can be applied to other species with such data