1 / 15

Bio277 Lab 3: Finding Transcription Factor Binding Motifs

Bio277 Lab 3: Finding Transcription Factor Binding Motifs. Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush Lab DFCI jmar@hsph.harvard.edu. Outline. Analyze cell cycle gene expression data.

yin
Télécharger la présentation

Bio277 Lab 3: Finding Transcription Factor Binding Motifs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bio277 Lab 3: Finding Transcription Factor Binding Motifs Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush Lab DFCI jmar@hsph.harvard.edu

  2. Outline Analyze cell cycle gene expression data. Cluster cell cycle data using hierarchical clustering. Visualize cell cycle clusters. Find motifs in these clusters and visualize them using sequence logos.

  3. The Cell Cycle

  4. Cell Cycle Data Set Paper: Spellman et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 1998, p3273. • Experiments assayed mRNA expression patterns over the duration of one cell cycle (at least). • Custom cDNA microarray platform. • RNA samples from Saccharomyces cerevisiae cell culture. • 3 methods of synchronization - -factor arrest, cdc15, elutriation. • Today's data:-factor arrest (blocks cell division in G1). • ~6000 genes x 17 times points • Sampled at 7min intervals over 120min, starting at time zero. • See http://cellcycle-www.stanford.edu

  5. Experimental Data From the ~6000 yeast genes, we have chosen to focus on those involved in key biological processes (such as cell cycle, oxidative phosphorylation and nucleotide metabolism). Read the data into R: dat <- read.table("ccexpdata.txt", header=T, sep="\t") Objective: find transcription factor binding sites implicated in the cell cycle. • How do we search for these binding sites? • Where do we begin to search?

  6. Linking Gene Expression and Promoters One canonical representation of gene regulation. Genes that are regulated by the same transcriptional program share similar expression patterns. But co-expression does not always imply co-regulation. We look to upstream promoter regions to see if we can elucidate common regular expression patterns. Statistically over-represented patterns are potential transcription binding sites.

  7. Building Gene Expression Clusters distMat <- dist(dat, method="euclidean") clustObj <- hclust(distMat) plot(clustObj) How many clusters should we use? cluster.labels <- cutree(clustObj, 15) print(table(cluster.labels)) The cluster distribution looks like: barplot(table(cluster.labels), xlab="Cluster Size", ylab="Frequency")

  8. Visualizing Clusters Let's plot the first 8 clusters: par(mfrow=c(2,4)) for( i in 1:8 ){ titleLab <- paste("Cluster ", i, sep="") expr.prof <- as.matrix(dat[cluster.labels == i,]) plot(expr.prof[1,], ylim=range(expr.prof, na.rm=T), type="l", xlab="Time", ylab="Expression", main=titleLab) apply(expr.prof, 1, lines) }

  9. par(mfrow=c(2,4)) for( i in 9:15 ){ titleLab <- paste("Cluster ", i, sep="") expr.prof <- as.matrix(dat[cluster.labels == i,]) plot(expr.prof[1,], ylim=range(expr.prof, na.rm=T), type="l", xlab="Time", ylab="Expression", main=titleLab) apply(expr.prof, 1, lines) }

  10. Exporting Expression Clusters Write out the gene names in each cluster into a text file: for( i in 1:15 ){ cluster.genes <- row.names(dat)[cluster.labels == i] fileName <- paste("cluster", i, ".txt", sep="") write(cluster.genes, fileName) } Are they there? dir()

  11. Retrieving Promoter Sequences Let's focus on Cluster 12. We can retrieve the promoter sequences for these genes using a tool called RSA: http://rsat.scmbb.ulb.ac.be/rsat//RSAT_home.cgi When working on yeast genomics, another great resource is: http://www.yeastgenome.org/

  12. TF Motif Finding Tools MEME http://meme.sdsc.edu/meme/meme.html AlignACE http://atlas.med.harvard.edu/cgi-bin/alignace.pl BioProspector http://ai.stanford.edu/~xsliu/BioProspector/

  13. Making Sequence Logos WebLogo http://weblogo.berkeley.edu/logo.cgi SEQLOGO http://ep.ebi.ac.uk/EP/SEQLOGO/

  14. TRANSFAC Database http://www.gene-regulation.com/pub/databases.html#transfac Database on eukaryotic cis-acting regulatory transcription factors. SITE: gives information on (regulatory) transcription factor binding sites within eukaryotic genes. GENE: explanation of the gene where a site (or group of sites) belongs to. FACTOR: describes the proteins binding to these sites. CELL gives brief information about the cellular source of proteins that have been shown to interact with the sites. CELL: gives brief information about the cellular source of proteins that have been shown to interact with the sites. CLASS: contains some background information about the transcription factor classes. MATRIX: gives nucleotide distribution matrices for the binding sites of transcription factors.

  15. Public Data Repositories for Gene Expression Studies http://www.ebi.ac.uk/microarray-as/aer/?#ae-main[0] 2701 experiments available. Expression profiles derived from 180 experiments, 112 510 genes available. http://www.ncbi.nlm.nih.gov/projects/geo/ 3916 expression platforms. 174 783 samples.

More Related