ania
Uploaded by
1 SLIDES
114 VUES
10LIKES

Information-Theoretic Dissimilarity for Clustering Gene Expression Profiles

DESCRIPTION

This paper presents a novel approach to smooth gene expression data and measure dissimilarity between gene profiles using a Kullback-Leibler (KL) based clustering method. The proposed two-step process involves modeling gene expression profiles via Gaussian Radial Basis Functions (GRBF) and calculating location-based match dissimilarities. The methodology effectively analyzes noisy, time-dependent gene expression data and applies clustering techniques like k-medoids. Results demonstrate that this model-based approach is a powerful tool for exploratory data analysis, suitable for various time-series data settings.

1 / 1

Télécharger la présentation

Information-Theoretic Dissimilarity for Clustering Gene Expression Profiles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript

Playing audio...

  1. An Information-theoretic Dissimilarity For Clustering Gene Expression Profiles Models JyotsnaKasturi, Raj Acharya, ShruthiPrabhakaraDepartment of Computer Science and Engineering, Penn State University INTRODUCTION • A new method to smooth the gene expression data and measure expression dissimilarity between genes is presented [Kasturi, J and Acharya, R. IJCNN 2008]. A Kullback-Leiber (KL) based Clustering method to analyze the noisy time-dependent gene expression data is proposed. The method presented is a two-step process: • Modeling Gene Expression Profiles using Gaussian Radial Basis Functions (GRBF). • Location-based Match Dissimilarity between Gene Profile distributions followed by clustering. MODELING GENE EXPRESSION PROFILES Let G= {g1, g2,…gN} denote the data matrix containing expression levels of Ngenes measured over time. The expression profile of each gene gi can be approximated using a linear combination of ni non-linear basis functions. The parameters of GRBF model are learned using back propagation. Observed data (circles), linear fit (dotted line), GRBF Fit with 4 Gaussian components(solid line) and individual components(dash dotted line) ni: No. Gaussian components Θ = (µ ,σ): mean and width of distribution KL- BASED DISSIMILARITY BETWEEN GENE PROFILE DISTRIBUTIONS KL Location Match, a new dissimilarity measure uses a matching strategy by calculating the distance between every Gaussian component in one gene to its closest paired component in the other gene. The normalized weight for the kthcomponent is denoted as βkand given by: where The parameter represents the threshold value by which components that contribute significantly to the shape of an expression profile are selected based on mixture weights. Location based match between GRBF with 3 components and GRBF with 4 components. Two Gaussian Radial Basis Functions where the components are utilized in the KL divergence approximation based on their mixture weights. τ CLUSTERING Clustering is performed using the k-medoid procedure on the RBF-fitted genes using KL-LocMatchdissimilarity, which may be made symmetric as the sum of the two asymmetric dissimilarities. D Davies Bouldin Cluster Validity Index calculated for number of Clusters Cluster obtained using Location-based Match approximation with varying threshold values of (A) 50% (B) 70% (C) 80% (D) 100% CONCLUSION • A new model-based approach to smoothing and measuring the dissimilarity for gene expression data from time-dependent experiments is proposed. Results from real data when validated show that the proposed method is a powerful tool for exploratory data analysis and clustering gene expression data. • This method can be applied to evenly or unevenly spaced time-series data.

More Related