840 likes | 1.1k Vues
PPI Network Alignment. 陳琨、朱安強、林晏禕、翁翊鐘 陳縕儂、呂哲安、楊孟翰. Protein-protein Interaction Network Alignment. Protein Biosynthesis. From DNA to life. Biology Technology. How do we measure protein interaction? Two-hybrid screens Co-immunoprecipitation. Two-hybrid screens. UAS. Reporter gene (LacZ).
E N D
PPI Network Alignment 陳琨、朱安強、林晏禕、翁翊鐘 陳縕儂、呂哲安、楊孟翰
Biology Technology • How do we measure protein interaction? • Two-hybrid screens • Co-immunoprecipitation
Two-hybrid screens UAS Reporter gene (LacZ) A. Regular transcription of the reporter gene
Two-hybrid screens no transcription UAS Reporter gene (LacZ) B. One fusion protein only (Gal4-BD + Bait) – no transcription
Two-hybrid screens no transcription UAS Reporter gene (LacZ) C. One fusion protein only (Gal4-AD + Prey) – no transcription
Two-hybrid screens UAS Reporter gene (LacZ) D. Two fusion proteins with interacting Bait and Prey
X Antibody Unknown protein Y Protein A Known viral protein Co-immunoprecipitation
Protein-Protein Interaction Networks? • Protein are nodes • Interactions are edges Yeast PPI network
Network comparisons • Query for a module • Predict functions of a module • Predict protein functions • Validate protein interactions • Predict protein interactions
Random network • Connect each pair of node with prob p • Expect value of edge is pN(N-1)/2 • Poisson distribution • The node with high degree is rare
Scale-free network • Power-law degree distribution • Hubs and nodes • When a node add into network, it prefer to link to hubs
The Network Alignment Problem • Given k different protein interaction networks belonging to different species, we wish to find conserved sub-networks within these networks • Conserved in terms of protein sequence similarity (node similarity) and interaction similarity (network topology similarity)
Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Brian P. Kelley , RodedSharan , Richard M. Karp , Taylor Sittler , David E. Root , Brent R. Stockwell , and Trey Ideker (2003) PathBlast
Protein Similarity • Homologous proteins: two proteins that have common ancestry. • Orthologous proteins: two protein from different species that diverged after a speciation event. • Paralogous proteins: two proteins from the same species that diverged after a duplication event. Source: Roded Sharan, Protein-protein Interaction: Network Alignment Lecture Note
Path Blast • PathBlast is a strategy for aligning two protein interaction networks to elucidate their conserved pathways. • This method identifies pairs of interaction paths, drawn from the networks of different species or from different processes within a species, where proteins at equivalent path positions share strong sequence homology. Source: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 2003.
Alignment Graph • Vertical solid line: protein-protein intertactions. • Horizontal dotted line: significant sequence similarity. • Node: a homologous protein pair. • Link: protein interaction relations of three types: direct, gap, and mismatch. Source: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 2003.
Yeast & Bacteria PPI Alignment graph • The yeast and bacteria global alignment graphs v.s. randomized networks obtained by permuting the protein name. • This suggests that both species share conserved interaction pathways. • “direct interaction” are rare. • “mismatches” and “gaps” were permitted, allowed overcome false negatives. Source: Roded Sharan, Protein-protein Interaction: Network Alignment Lecture Note
Scoring Function • p(v) is the probability of true homology with in the protein pair represented by v. • q(e) is the probability that the protein-protein interactions represented by e. • The background probabilities are the expected values of p(v) and q(e) over global alignment graph.
Pathways & Protein Complexes • PathBLAST is used to find conserved paths and then overlapping paths are merged into complexs. Source: Roded Sharan, Protein-protein Interaction: Network Alignment Lecture Note
Yeast v.s. Bacteria • Orthologous Pathways • Select the 150 highest-scoring pathway of length four from alignment graph. • Combing overlapping pathways, found fell into 5 network regions. • Right figure involves the union of 6 paths. • With similar function. • Solid link: direct interactions, dotted link: gaps or mismatches. Source: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 2003.
Yeast vs. Yeast. • Paralogous Pathways • Proteins were not allowed to pair with themselves or their neighbors. • Analyzed 150 highest-scoring pathway alignments of length 4 from alignment graph. • distinct alignments but homologous in function. Source: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 2003.
Pathway Queries • PATHBLAST identified two other well known MAPK pathways as the highest-scoring hits,indicating that the algorithm was sufficiently sensitive and specific to identify known paralogous pathways. Source: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. PNAS, 2003.
Identification of Protein Complexes Roded Sharan, Trey Ideker, Brian P. Kelley, Ron Shamir, Richard M. Karp: Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data. Journal of Computational Biology 12(6): 835-846 (2005)
Flashback • [Input] the alignment graph of 2 PPI networks. • We already can handle the problem of finding conserved linear pathways. • Now this is not the end: How can we step further?
Motivation • Finding more complex conserved structures is of practical interest.
Motivation • Finding more complex conserved structures is of practical interest. • [Reduction] Now we can merge overlapping paths into complexes.
Motivation • Finding more complex conserved structures is of practical interest. • [Reduction] Now we can merge overlapping paths into complexes. • Or we can develop another model to identify conserved complexes.
A New Model: The Main Idea • How do you recognize protein complexes? • Dense Subgraphs • Comparative Analysis "When I use a word," Humpty Dumpty said in a rather a scornful tone, "it means just what I choose it to mean -- neither more nor less." Lewis Carroll, Through the Looking-Glass
Dense Subgraph: Likelihood • Likelihood Formula 0.1: given an induced subgraph, • L(C) = |Ec|/ { ½ * |Vc| * ( |Vc| - 1 ) } • It makes sense: graphs with more edges have higher likelihood.
Dense Subgraph: Likelihood(Cont.) • Likelihood Formula 0.1: given an induced subgraph, • L(C) = |Ec|/ { ½ * |Vc| * ( |Vc| - 1 ) } • It makes sense: graphs with more edges have higher likelihood. • We only consider the structure of graphs. • Problems of link analysis are often data-dependent.
Dense Subgraph: Likelihood(Cont.) • Likelihood Formula 0.1: given an induced subgraph, • L(C) = |Ec|/ { ½ * |Vc| * ( |Vc| - 1 ) } • Likelihood Formula 0.2: given an induced subgraph, • What the hell is it?
Dense Subgraph: Likelihood(Cont.) • What do you expect about the behavior of revised formulas? • Higher likelihood: The scores of dense graphs are higher. • Adjustment: The weakest link • Bonus: Interaction with low probability happens.
Dense Subgraph: Likelihood(Cont.) • Higher likelihood: The scores of dense graphs are higher. • We assume that every 2 proteins in a complex interact with some probability p( 0.8 is used in this work). • We can use the model as a baseline for comparing density.
Dense Subgraph: Likelihood(Cont.) • Adjustment: The weakest link! • p(u,v) is defined to be the fraction of graphs in FG that includes this edge. • FG : the family of graphs with V and the same degree sequence. • Edges incident on vertices with higher degrees have higher probability.
Dense Subgraph: Likelihood(Cont.) • Likelihood Formula 0.2: given an induced subgraph, • What the hell is it? • For p(u,v) = 0.2, we have 4 and ¼ in both side. • For p(u,v) = 0.6, we have 4/3 and 1/2 in both side. • It makes sense! We emphasize the weakest link.
Dense Subgraph: Likelihood(Cont.) • Likelihood Formula 0.1: given an induced subgraph, • L(C) = |Ec|/ { ½ * |Vc| * ( |Vc| - 1 ) } • Likelihood Formula 0.2: given an induced subgraph, • Likelihood Formula 0.3: given an induced subgraph,
The Main Idea Revisited • How do you recognize protein complexes? • Dense Subgraphs • We have some revised formula for density in a PPI network. • Comparative Analysis
Comparative Analysis • Idea: If some structure occurs in different species, it is of high probability to be some meaningful structure. • How do you define dense substructures on alignment graphs?
Comparative Analysis(Cont.) • Consider two subsets U1 ={ u1,..., uk}, V2 ={ v1,..., vk} and Θ: U1 → V2 is a many-to-many correspondence. • Since you already have • You may derive the formula 1.1 as follows: • Does it make sense?
Comparative Analysis(Cont.) • Θ is useful information: • You have the formula 1.2: { A/(A+B) }/ {X/(X+Y)}
The Main Idea Revisited • How do you recognize protein complexes? • Dense Subgraphs • We have some revised formula for density in a PPI network. • Comparative Analysis • We have some revised formula for density in an alignment network.
Search the Complexes • Now we only need to find heavy subgraphs in the alignment graph. • The problem is NP-Hard.
Search the Complexes(Cont.) • [Seed] Compute a seed around each node v. • [Refined Seed] Enumerate all subsets of the seed that have size 3 and contain v. • [Local Search] Iteratively modify the refined seed. • [Output Heavy Subgraphs] For each node, we record at most k heaviest subgraphs.
Search the Complexes(Cont.) • [Seed] Compute a seed around each node v. • [Restrict the Size] Keep seeds small! • [Refined Seed] Enumerate all subsets of the seed that have size 3 and contain v. • [Local Search] Iteratively modify the refined seed. • [Output Heavy Subgraphs] For each node, we record at most k heaviest subgraphs. • [Filtering overlapping ones] Greedy method is used!
The Main Idea Revisited • How do you recognize protein complexes? • Dense Subgraphs • We have some revised formula for density in a PPI network. • Comparative Analysis • We have some revised formula for density in an alignment network. • Finally, we have some practical method to search complexes!