Systematic Analysis of Interactome: A New Trend in Bioinformatics

KOCSEA Technical Symposium 2010 Systematic Analysis of Interactome:A New Trend in Bioinformatics Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University

History of Bioinformatics Stage 1. Sequence Analysis • Gene sequencing • Sequence alignment • Homolog search • Motif finding

History of Bioinformatics Computational Biology Stage 1. Sequence Analysis Stage 2. Structure Analysis • Gene sequencing • Sequence alignment • Homolog search • Motif finding • Protein folding • Homolog search • Binding site prediction • Function prediction

History of Bioinformatics Computational Biology Functional Genomics Stage 1. Sequence Analysis Stage 2. Structure Analysis Stage 3. Expression Analysis • Gene sequencing • Sequence alignment • Homolog search • Motif finding • Protein folding • Homolog search • Binding site prediction • Function prediction • Function prediction • Gene clustering • Sample classification

History of Bioinformatics Computational Biology Functional Genomics Systems Biology Stage 1. Sequence Analysis Stage 2. Structure Analysis Stage 3. Expression Analysis Stage 4. Network Analysis • Gene sequencing • Sequence alignment • Homolog search • Motif finding • Protein folding • Homolog search • Binding site prediction • Function prediction • Function prediction • Gene clustering • Sample classification • Network modeling • Interaction prediction • Function prediction • Pathway identification • Module detection

Biological Networks • Definition • Maps of biochemical reactions, interactions, regulations between genes or proteins • Importance • Provide insights into the mechanisms of molecular function within a cell • Significant resource for functional characterization of genes or proteins • Require computational and systematic approaches • Examples • Metabolic networks • Protein-protein interaction networks • Genetic interaction networks • Gene regulatory networks (Signal transduction networks)

Protein Interaction Networks • Determination • Experimental methods: Y2H, MS, Protein Microarray • Computational methods: Homolog search, Gene fusion analysis, Phylogenetic profiles • Genome-scale protein-protein interactions  Interactome • Representation • Un-weighted, undirected graph • Challenges • Unreliability • Large scale • Complex connectivity

Network Re-structuring • Strategy • To resolve complex connectivity • Converts the complex graph to • a hierarchical tree structure • Uses the concepts of path strength, • functional linkage, and centrality • Process • Input: a protein interaction network • Output: a list of functional modules unweighted network edge weighting weighted network functional linkage measurement score matrix network restructuring structured network hub confidence measurement hubs network clustering clusters

Path Strength • Path Strength Model • Assumption: each node in a path chooses a succeeding edge based on the weighted • probability • Path Strength Factors • Edge weights • Path length • Node weighted degree

Functional Linkage shortest path length threshold • Measurements • Path strength of the strongest path between two nodes • Computational problem • Needs a heuristic approach • Uses a user-specified threshold of the max path length • Formula • k-length path strength: • Functional linkage:

Network Restructuring • Centrality • Weighted closeness: • Algorithm • Computes centrality for each node a • Selects a set of ancestor nodes, T(a), of a by • Selects a parent node, p(a), of a by • Example

Hub Confidence • Measurement • Selects a set of child nodes, D(a), of a by • Selects a set of descendent nodes, La, of a by • Computes the hub confidence, H(a), of a by • Example

Clustering • Algorithm • Iteratively select a hub a with the highest hub confidence • Output the sub-tree La including a as a cluster (functional module) • Cluster Depth • The max path length from the root of the sub-tree to a leaf • Example

Topological Assessment of Hubs • Network Vulnerability • Random attack: repeatedly disrupt a randomly selected node • Degree-based hub attack: repeatedly disrupt the highest degree node • Structural hub attack: repeatedly disrupt the node with the highest hub confidence • For each iteration, observes the largest component • Results

Biological Assessment of Hubs • Protein Lethality • Determines lethal / viable proteins by knock-out experiment • Lethality represents functional essentiality • Orders proteins by degree and hub confidence • Observes the cumulative proportion of lethal proteins for every 10 proteins • Results

Topological Assessment of Clusters • Modularity • A combined measure of density within each cluster and separability among clusters • Estimated by the ratio of the number of edges within a cluster (sub-graph) to the number of all edges starting from the nodes in the cluster (sub-graph) • Observes the average modularity of clusters with respect to the cluster depth • Results • More specific function module has higher modularity • Justify the general-to-specific concepts of hierarchical functional modules

Biological Assessment of Clusters • f-Measure • Compares each output cluster X with the real functional annotation Y (from MIPS) • Recall = (# of common proteins of X and Y) / (# of proteins in Y) • Precision = (# of common proteins of X and Y) / (# of proteins in X) • f-measure = 2 × Recall × Precision / (Recall + Precision) • Results • Compared with the results from previous hierarchical clustering methods, e.g., edge-betweenness (top-down approach) and ProDistIn (bottom-up-approach)

Conclusion • Motivation • Significant functional knowledge in protein interaction networks (interactome) • Complex connectivity • Contributions • Convert an unstructured network to a structured network • Conserve functional information through pathways • High network vulnerability, low functional lethality at hubs as a drug target • Applicable to various fields, e.g., social networks, WWW • Foundation of structural dynamics during network evolution

Questions ? • Reference • Y.-R. Cho and A. Zhang, “Identification of functional modules by converting • interactome networks into hierarchical ordering of proteins”. BMC Bioinformatics, • 11(Suppl 3):S3, 2010

Systematic Analysis of Interactome: A New Trend in Bioinformatics