Abstract

Abstract • Background: In this work, a candidate gene prioritization methodis described, and based on protein-protein interaction network (PPIN) analysis vs ToppGene (functional prioritization method). • Results: For the first time, the PageRank and HITS algorithms and the K-Step Markov method used in Web and social network analysis, are applied to a PPIN to prioritize disease candidate genes. • Conclusion: PPIN-based candidate geneprioritization performs better than all others gene features or annotation. Itcan be successfully used for disease candidate gene prioritization.

Background-1 • Most of the current disease candidate gene identification and prioritization methods rely on functional annotations from different data sources: GO, Pathways,Domains, Expressions.. • In their recent work, the authors used a functional prioritization method named ToppGene: they integrated functional data with Mouse Phenotype data. ToppGene outperforms better than the other published functional prioritization methods. • In these methods there is a limitation, with regard to the coverage of the gene functional annotation: - only a fraction of human genome is annotated with pathways and phenotypes - 2/3 of all genes are annotated by at least one functional annotation - 1/3 is yet to be annotated

Background-2 Different approach • In this study, for the first time, they applied to a PPIN , social andWeb-network analysis-based algorithms to prioritize disease candidate genes • PPINrepresented asunweighted, undirected, simple graph G (V, E); genes are nodes, interactions are edges, V all genes, E all interactions. The set of known disease genes (seeds) is denoted as R. • Prioritization approaches are based on the methods of White and Smyth whose framework of four successive problem formulations defines the approach to rank nodes in the unweighted graph G (V,E).

Methods-1 White and Smyth problem formulations: • Given G, where t and r are both nodes in G, compute the Importance I(t|r) of the node t respect to the root r • Given G and a root node r in G, rank all vertices in T, a subset of vertices in G and for each node in t in T compute I(t|r) • Given G and a set of root node R in G, rank all vertices in T. The I(t|R) is the average sum of importance of each node in R: I(t|R) = (1/|R|)(sum(I(t|r)) 4. Given G, rank all nodes where R=T=V • The solution of the formulation 3 is what is needed in this study: here the problem is to prioritize a set of genes in the network based on their importance to a set of root genes (genes known to be associated with a disease). • The importance of a gene to the set of root genes is just the average sum of its importance towards each individual root gene.

Methods-2 • The solution is to find I(t|r), the importance of the node t with respect to a root node r. • They used the three algorithms from White and Smythmethods: • PageRank • HITS 3. K-Step Markov

Methods-3 Human protein interactions network • The Human protein-protein interactions were extracted from the NCBI Entrez Gene FTP site with 8340 nodes and 27250 edges (BIND, BioGRID, HPRD). Evaluation of PPIN for gene prioritization • they used the same training data, from their previous study, comprising 19 diseases on OMIM (Online Mendelian Inheritance in Man) and GAD (Genetic Association Database) databases. A total ol 693 associated genes. 589 genes were used in the cross validation. Cardiac septal defect candidate gene prioritization • From NCBI’s OMIM databse: 166 OMIM records were extracted; they had the label “atrial septal defect”. 81 genes were mapped on these records and used as the training set. 431 genes (from interactions) used for ranking (test set).

Rank-based ROC curves were plotted, and AUC values were used to quantitatively measure the performance. 13 conditions with 3 algorithms different parameter settings repeated 5 times Results-1 Cross validation

Results-2

A combined functional annotations and PPIN-based methods are more effective in identifying and ranking of disease candidate genes Results-3 Top 20 ranked genes Mice with deletion of Erbb2 show ventricular septal defects (VSD) Suggesting that the human ortholog ERBB2 could be a potential candiadte gene for VSD Mouse embryos lacking p300 protein (EP300 gene) show ventricular septal defects Truncated CBP protein (CREBBP gene) leads, in mice, atrial and ventricular septal defects *Genes associated with cardiac development or malformation: 15 ToppGen, 14 PPIN-based method #(hash) genes associated with septal defects: 6 ToppGene, 3 PPIN-based method

Results-4 Prioritized candidate genes of cardiac septal defects using both functional annotation- and PPIN- based methods.

Results-5 AUC of different feature sets. Red bars indicate the AUC scores based on each feature set, and blue bars are the corresponding random controls.

Conclusions-1 • PageRank, HITS, K-Step Markov algorithms were applied on a Literature-based and manually curated protein interactions network. • Goal: to prioritize disease candidate genes. Known disease-related genes was used as a training set ("seeds"), and the candidate genes were ranked. • Network-based methods are generally not as effective as the integrated functional annotation-based methods. • By comparing PPIN-based methods to the individual functional annotation features, network-based methods are better than all annotations. • Therefore, PPINs can be a good feature for disease candidate gene prioritization, especially when the genes lack all other functional annotations or are sparsely annotated.

Conclusions-2 • Limitations: Just like functional annotation-based methods, the performance depends on the quality of interaction data (missing interactions and false positives). Solutions: • betterfit with biological networks (e.g., using weighted nodes - genes or proteins - or edges – interactions-). • integrate the method with other methods (e.g., combining results from functional annotation-based methods and expression profiles with network-based approaches). • It is expected that using bothfunctional annotations and PPIN-based topological parameters may better facilitate the discovery and prioritization of disease genes.

Abstract

Abstract

Presentation Transcript

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

ABSTRACT

Abstract

ABSTRACT

Abstract

ABSTRACT

Abstract

ABSTRACT

ABSTRACT

Abstract

Abstract

Abstract

ABSTRACT THE ABSTRACT / TUTORIALOUTLETDOTCOM

Abstract