130 likes | 264 Vues
This presentation covers the importance of statistical methods in analyzing temporal microarray data, particularly in the context of systems biology. With applications in agriculture, energy, gene therapy, and waste management, the analysis tackles the challenges posed by massive datasets generated from molecular biology experiments. The study focuses on understanding leaf senescence in Arabidopsis thaliana, addressing biological and technical variations, and employing normalization, clustering, and causal network inference to reveal gene interactions and functions.
E N D
Statistical Techniques for Temporal Microarray Data Analysis Ritesh Krishna Department Of Computer Science WPCCS July 1, 2008
Why should you listen to my talk ? • System Biology is everybody’s playground in this room – Image processing, Algorithms, Parallel processing etc. • Importance of System Biology in today’s context – • Agriculture • Energy sources (Bio Fuels) • Gene Therapy • Waste clean-up
Use of Computational Techniques • Massive data generated by molecular biology experiments • Need to analyse outputs files produced in various formats, facilitate storage of bulk data, quick and precise retrieval, and most importantly understanding the behaviour and pattern in the data
How are these experiments performed Major revolution in the world of molecular biology No limitation of one gene in one experiment Possible to monitor expression levels of thousands of genes simultaneously
An example - Arabidopsis Thaliana • Popular in plant biology as a model plant • One of the smallest plant genome • First plant genome to be sequenced • Present Study • The present study is about understanding • leaf senescence process in Arabidopsis. • Senescence refers to the biological processes • of a living organism approaching an advanced • age, caused due to age and stress in plant • It is a programmed event responding to a wide • range of external and internal signals and is • controlled in a tightly regulated manner by • different genes and proteins..
Experimental Design Dye Laser (Total 16 replicates) Quantitative Data
Issues with data • Biological variations vs. Technical variations • Technical variations – Sample bias, Dye bias, Slide bias, Experimental conditions variations, Scanning and Imaging errors, Human errors • Massive dataset with ~31,000 genes • Goal is to understand functioning of certain sets of genes (needle in the haystack)
Step one – Clean the raw data using Normalization • To assess different sources of technical biases • To remove the correlations between replicates to make them independent from each other • Fitting a multivariate error model - Normal distribution with mean zero and constant variance for the residuals associated with genes • Propose statistical tests for evaluating the effects of normalization
Step two - Clustering • Reduce the data dimension • Similar genes sit in the same cluster.
Circadian Circuit ELF4 TOC1 LFY CCA1
ERS2 ERS1 ETR2 ETR1 CTR1 EIN2 EIN6 EIN4 EIN3 EIL2 EIL1 EIL4 EIL3 EIL5 ERF1 PDF1.2
More information…. • Affymetrix Inc. (http://www.affymetrix.com/index.affx) • Agilent Technologies (http://www.chem.agilent.com) • Microarray Analysis , Gibson G (2003) Microarray Analysis. PLoS Biol 1(1): e15