110 likes | 237 Vues
This study explores the analysis of microarray data from the GNF tissue database using R, detailing techniques such as clustering, positional co-regulation, and insights into tissue-specific gene expression patterns, including apoptotic configurations. The research highlights significant findings from the GNF Expression Atlas, including correlations among genes and functions based on clustering, the prediction of gene functions, and the investigation of poorly characterized genes like Top1MT. Furthermore, patterns of apoptosis in various tissues are analyzed, underlining the regional biases in dataset quality.
E N D
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI
Outline • The GNF tissue database • Exploratory analysis - clustering • Positional co-regulation • Insight via co-regulation • Apoptotic configuration of tissues • Probe level analysis
The GNF Expression Atlas • Su et al ( PNAS 2004) hybridized 150 samples from 61 tissues to Affymetrix U133A and custom arrays • Variation in gene expression (as proportion of transcriptome) • 95% show at least one 2-fold change among 61 tissues • 37% show more than 2-fold differences between lowest 10% and highest 10%
Clustering samples • All biological replicates are nearest neighbors • Dendrogram reflects discrepancy between healthy and cancerous
Co-regulation of Nearby Genes • Some groups of genes next to one another on chromosome show high correlation across tissues
Significance of Co-regulation • How often would such correlations happen ‘by chance’ - eg. by selecting genes at random? • Three random measures would have correlation greater than 0.6 with p < 10-20! • However 3 genes selected at random from atlas have probability ~ 10-3 of having all corrs > 0.6 • In 30,000 positions, we should see 30 • 156 regions of high correlation determined • Many are paralogs • Perhaps 50% false discovery rate among the rest
Prediction of Function • Zhang, et al (J. Biol, 2004, 3:21) hybridized 55 mouse tissues to spotted oligo arrays • Hypothesis: genes with similar tissue expression patterns share similar function • Able to recover prediction of GO biological process for known genes with better than 50% accuracy for many categories • Extended prediction to 1,092 uncharacterized transcripts
Investigation of Poorly Characterized Gene - Top1MT • 10-fold variation in expression (odd for a ‘housekeeping gene’) • >50 genes with expression highly correlated ( .75) with Top1MT across tissue database • Large proportion are splicing factors • Top1MT has an odd splice junction in intron 1, and may depend critically on abundant splicing factors
Apoptosis Patterns • Majority of epithelial tissues show common pattern (indisposed to apoptosis) • Blood cells show variety of patterns
Exploration of Probe Sets • Examine correlation of probe sets across 150 samples • All but one probe verified to match latest Unigene build for gene • Probes organized by position in 3’ end Red: 1; White: < 0
Quality of Arrays • Regional bias images