1 / 46

Identification of Transcription Factor Binding Sites

Identification of Transcription Factor Binding Sites. Presenting: Mira & Tali. AGCCA AGCCA AGCCA AGCCA AGCCA. Goal. Regulatory regions. Motif – Binding site???. AGCCA. Why Bother?. UNDERSTAND. Gene expression regulation. Co-regulation. Difficulties.

lisle
Télécharger la présentation

Identification of Transcription Factor Binding Sites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

  2. AGCCA AGCCA AGCCA AGCCA AGCCA Goal Regulatory regions Motif – Binding site??? AGCCA

  3. Why Bother? UNDERSTAND Gene expression regulation Co-regulation

  4. Difficulties • Multiple factors for a single gene • Variability in binding sites • The nature of variability is NOT well understood • Usually Transitions • Insertions and deletions are uncommon • Location, location, location…

  5. Experimental methods • EMSA – Electrophoretic mobility shift assay • Nuclease protection assay NOT ENOUGH!!!!!

  6. So, what can we do? • Find conserved sequences in regulation regions 1. Define what you want to find 2. Define what is a good result 3. Decide how to find it…

  7. Principal Methods: • Global optimum • Enumerative methods • Going over ALL possibilities • Taking the best one Disadvantage : Limited to small search spaces Advantage : Certainty

  8. Principal Methods: • Local optimum • Gibbs sampling, AlignACE • Start somewhere (arbitrary) • Next step direction – proportional to what we “gain” from it • We can get anywhere with some probability Disadvantage : You can never know… Advantage : Basically good results, faster

  9. Articles Overview • Identifying motifs • Expression patterns • Phylogenetic footprinting • Identifying networks • Common motifs in expression clusters • Combinatorial analysis

  10. Goal: Identify binding sites in yeast Identify over-represented upstream sequences Use sets of co-regulated genes Discovery of novel trancription factor binding sites by statistical overrepresentationS. Sinha, M. Tompa Enumeration YMF algorithm

  11. What constitutes a motif?(tailored for S.cerevisiae) • In S.cerevisiae typically 6-10 conserved bases – The motif • Spacers varying in length (1-11bp) • Usually located in the middle ACCNNNNNNGTT Taken from SCPD – S.cerevisiae promoter database

  12. How do we measure motifs? • Z-score – Motifover-representation • Pmax(X) – Probability of Zscore >= X

  13. INPUT: 6 11 YMF algorithmYeast Motif Finder Transition Matrix A set of promoter regions • Motif length - l • modest values Maximum number of spacers allowed - w

  14. Post Processing: YMF algorithm FindExplanators: artificial overrepresentation Co-expression score W-score TCACGCT (motif) CACGCTA (artifact)

  15. Experiments • Validate YMF results • Running YMF on regulons with known binding sites (SCPD) • Run YMF on MIPS catalogs (MIPS - Munich Information center for Protein Sequences) • Functional • Mutant phenotype

  16. Validation

  17. New binding sites or false positives?

  18. A novel site candidate

  19. Further research • Validation of novel binding sites and transcription factors • Modification of the algorithm to be applicable for other organisms

  20. Goal: Identify co- regulated networks of genes in yeast Cluster by expression patterns Identify upstream sequence patterns Systematic determination of genetic network architectureSaeed Tavazoie, Jason D. Hughes, Michael J. Campbell, Raymond J. Cho, George M. Church AlignACE Aligns Nucleic Acid Conserved Elements

  21. Clusters • Cluster – a group of genes with a similar expression pattern • Cluster’s members • Tend to participate in common processes • Tend to be co-regulated

  22. 10-54 Clusters

  23. Identifying motifs • Using AlignACE 18 motifs from 12 clusters were found. • 7 of the found motifs were identified experimentally And what about the others????

  24. Scanning for more binding sites • Once a significant motif was found the whole genome was scanned for it • Most motifs were cluster specific

  25. Why so few motifs? • Too stringent rules for defining a “significant” motif • Post transcriptional regulation (mRNA stability) • Some clusters represent “noise”

  26. “Tightness” • “Tightness” of a cluster • how close are the cluster members of a particular cluster to its mean • A strong correlation between the presence of significant motifs and the “tightness” of a cluster

  27. Things to remember • Discovering regulons and motifs using expression based clustering • Minimal biases • Validation as a methodology for new organisms • Identifying expected cis-regulatory motif EACH TIME!!

  28. Identifying regulatory networks by combinatorial analysis of promoter elementsby Yitzhak Pilpel, Priya Sudarsanam & George M.Church Goals: Identify motif combinations affecting expression patterns in yeast Understand transcriptional network

  29. Basic definitions • Expression coherence score- • Synergistic motifs – EC(a&b) > EC(a\b) , EC(b\a)

  30. Methods: A database of motifs Gene sets Calculating EC score Significant synergistic combinations Understanding the effect of individual and combination of motifs Visualizing the transcriptional network

  31. GMC • GMC – Gene Motif Combination. Motif numbers: (m1, m2, m3, m4, m5) = (1,0,1,1,0) • Synergistic motif combination- EC(n motifs) > max(EC(n-1 motifs)) • GMC – what is it good for?

  32. Combinograms Clustering GMCs

  33. Combinograms – what is it good for? • They help visualizing the “single motif - specific expression pattern” connection • They also show which motif is more critical in determining expression pattern.

  34. Motif synergy mapvisualizing transcription networks

  35. conclusion • The combinogram importance • The motif synergy map importance

  36. Finding orthologs Identify upstream sequence patterns Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomesLee Ann McCue, William Thompson, C.Steven Carmack, Michael P.Ryan, Jun S.Liu, Victoria Derbyshire and Charles E.Lawrence Goals: Identifying novel TF binding sites in E.coli Describing transcription regulatory network Local optimum Gibbs sampling algorithm

  37. Data set Gibbs sampling algorithm MAP score – a measure of overrepresentation of motif Motif Methods: One E.coli gene and orthologs

  38. Applying the method in a small scale – Validation • Choosing 190 E.coli genes. • Creating 184 data sets. • Running Gibbs sampling algorithm. • More than 67% success in the prediction for the most probable motif.

  39. Motif Model

  40. Identification of the YijC binding sites • A strongly predicted site was upstream of the fabA, fabB and yqfA genes. • Chromatography – identifying the factor.

  41. Identifying the YijC binding sites and predicting gene function • Mass spectrometry identification – YijC • Predicting a function for yqfA. weight fabA fabB yqfA fadB

  42. Applying the method genome wide • Choosing 2113 E.coli ORFs. • For 2097 a TF-binding site was predicted.

  43. Map scores- ortholog distribution Study set Full set

  44. Adding binding sites for known TFs • Building a TF binding site model for known TFs. • Scanning E.coli upstream regions. • 187 new probable sites.

  45. Building a regulatory network • Required steps: • Identifying motif models • Clustering the models • Problem: • Specifity

  46. Conclusion • What have we gained so far? • A better prediction of gene function. • New possibilities for identification of TF binding site and the TF which binds them!!!

More Related