1 / 67

Evolutionary and genomic approaches to find gene regulatory sequences

Evolutionary and genomic approaches to find gene regulatory sequences. Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller, Francesca Chiaromonte, Anton Nekrutenko, Kateryna Makova, Stephan Schuster, Ross Hardison

bisa
Télécharger la présentation

Evolutionary and genomic approaches to find gene regulatory sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller, Francesca Chiaromonte, Anton Nekrutenko, Kateryna Makova, Stephan Schuster, Ross Hardison University of California at Santa Cruz: David Haussler, Jim Kent Children’s Hospital of Philadelphia: Mitch Weiss NimbleGen:Roland Green University of Nebraska, Lincoln February 14. 2007

  2. Major goals of comparative genomics • Identify all DNA sequences in a genome that are functional • Selection to preserve function • Adaptive selection • Determine the biological role of each functional sequence • Elucidate the evolutionary history of each type of sequence • Provide bioinformatic tools so that anyone can easily incorporate insights from comparative genomics into their research

  3. Known types of gene regulatory regions G.A. Maston, S.K. Evans, M.R. Green (2006) Ann. Rev. Genomics & Human Genetics 7:29-59.

  4. Regulatory regions tend to be clusters of transcription factor binding sites Sequence-specific SV40 promoters and enhancer

  5. Properties of known regulatory regions • Binding sites for transcription factors, many with sequence specificity • Clusters of binding sites • Conventional promoters encompass major start sites for transcription • Conserved over evolutionary time???

  6. Structures involved in transcription are probably more complex Middle image: Green: active transcription (Br-UTP label) Red: all nucleic acids HeLa cell Sides: EM spreads of transcripts Peter R. Cook, Oxford University, http://users.path.ox.ac.uk/~pcook/images/Images.html

  7. Domain opening is associated with movement to non-heterochromatic regions Schubeler, Francastel, Cimbora, Reik, Martin, Groudine (2000) Genes & Dev. 14: 940-950

  8. Other possible activities for sequences involved in gene regulation • Opening or closing a chromosomal domain • Move a gene to or away from a transcription factory • Control how long a gene is in a transcription factory • Long association • High level expression • Really long gene • Short association • Lower level expression • Rapid regulation • Are these conserved over evolutionary time?

  9. 3 modes of evolution Sequence matches at longer phylogenetic distances could reflect purifying selection Sequence differences at closer phylogenetic distances could reflect adaptive evolution.

  10. Conservation vs. Constraint • Conserved sequences are those that align between two species thought to be descended from a common ancestor • Constrained sequences show evidence in their alignments of negative (purifying) selection • E.g. change at a rate significantly slower than “neutral” DNA

  11. Human vs mouse Negative selection (purifying) Similarity Neutral DNA Human vs rhesus Neutral DNA Similarity Positive selection (adaptive) P (not neutral) Neutral DNA Position along chromosome DNA segments with a function common to divergent species. DNA segments in which change is beneficial to at least one of the two species. Ideal cases for interpretation

  12. Messages about evolutionary approaches to predicting regulatory regions • Regulatory regions are conserved, but not all to the same phylogenetic distance. • Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential). • Regulatory potential in combination with conservation of a GATA-1 binding motif is an effective predictor of enhancer activity. • In vivo occupancy by GATA-1 suggests other activities in addition to enhancers. • Comparison of polymorphism and divergence from closely related species can reveal regulatory regions that are under recent selection.

  13. Finding all gene regulatory regions is a challenge for comparative genomics • Known regulatory regions for the HBB complex • 23 total • 19 conserved (align) between human and mouse • Many others show no significant difference in a measure of constraint (phastCons) from the bulk or neutral DNA

  14. Two extremes of constraint in TRRs

  15. ENCODE projects • ENCODE (ENCyclopedia Of DNA Elements): consortium aiming to find function for all human DNA sequences • Phase I focused on 1% of human DNA • 30 Mb, 44 regions • About 10 regions had known genes of interest (CFTR, HOX) • Others were chosen to get a sampling of regions varying in gene density and alignability with mouse • Major areas • Genes and transcripts • Transcriptional regulation • Chromatin structure • Multiple sequence alignment • Variation in human populations

  16. Biochemical assays for protein-binding sites in DNA Purified protein & Naked DNA Chromatin Immunoprecipitation: DNA sites occupied by a protein inside cells.

  17. ChIP-on-chip to examine many sites

  18. Putative transcriptional regulatory regions = pTRRs • Antibodies vs 10 sequence-specific factors: • Sp1, Sp3, E2F1, E2F4, cMyc, STAT1, cJun, CEBPe, PU1, RA Receptor A • High resolution ChIP-chip platforms: Affymetrix and NimbleGen • Data from several different labs in ENCODE consortium • High likelihood hits for ChIP-chip • 5% false discovery rate • Supported by chromatin modification data • Modified histones in chromatin: H4Ac, H3Ac, H3K4me, H3K4me2, H3K4me3, etc. • DNase hypersensitive sites (DHSs) or nucleosome depleted sites • Result: set of 1369 pTRRs

  19. A small fraction of cis-regulatory modules are conserved from human to chicken • About 4% of pTRRs, 4% of DNase HSs, 4-7% of promoters active in multiple cell lines • Tend to regulate genes whose products control transcription and development Millions of years 91 173 310 450 David King

  20. Most pTRRs are conserved in eutherian mammals Percentage of class that align no further than: pTRRs DNase HSs Promoters 11% Primates: 3% 1-13% Millions of years 91 70% Eutherians: 71% 63% 173 310 14% Marsupials: 21% 16-28% 450 Tetrapods: 4% 4% 4-7% Vertebrates: 1% 1% 2-4% Within aligned noncoding DNA of eutherians, need to distinguish constrained DNA (purifying selection) from neutral DNA.

  21. Measures of conservation and constraint capture only a subset of pTRRs Fraction overlapping an MCS phastCons (background rate corrected) Composite alignability (background rate corrected) Aligns, but no inference about purifying selection Allows a range of constraint Stringent constraint

  22. Different measures perform better on specific functional regions Sensitivity 1-Specificity

  23. Examples of clade-specific pTRRs

  24. Messages about evolutionary approaches to predicting regulatory regions • Regulatory regions are conserved, but not all to the same phylogenetic distance. • Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential). • Regulatory potential in combination with conservation of a GATA-1 binding motif is an effective predictor of enhancer activity. • In vivo occupancy by GATA-1 suggests other activities in addition to enhancers. • Comparison of polymorphism and divergence from closely related species can reveal regulatory regions that are under recent selection.

  25. Regulatory potential (RP) to distinguish functional classes

  26. Good performance of ESPERR for gene regulatory regions (RP) - Francesca Chiaromonte James Taylor

  27. Messages about evolutionary approaches to predicting regulatory regions • Regulatory regions are conserved, but not all to the same phylogenetic distance. • Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential). • Regulatory potential in combination with conservation of a GATA-1 binding motif is an effective predictor of enhancer activity. • In vivo occupancy by GATA-1 suggests other activities in addition to enhancers. • Comparison of polymorphism and divergence from closely related species can reveal regulatory regions that are under recent selection.

  28. Conservation of predicted binding sites for transcription factors Binding site for GATA-1

  29. Genes Co-expressed in Late Erythroid Maturation • G1E-ER cells: proerythroblast line lacking the transcription factor GATA-1. • Can rescue by expressing an estrogen-responsive form of GATA-1 • Rylski et al., Mol Cell Biol. 2003

  30. Predicted cis-Regulatory Modules (preCRMs) Around Erythroid Genes B:Yong Cheng, Ross, Yuepin Zhou, David King F:Ying Zhang, Joel Martin, Christine Dorman, Hao Wang

  31. preCRMs with conserved consensus GATA-1 BS tend to be active on transfected plasmids

  32. preCRMs with conserved consensus GATA-1 BS tend to be active after integration into a chromosome

  33. Examples of validated preCRMs

  34. Correlation of Enhancer Activity with RP Score

  35. Validation status for 99 tested fragments

  36. preCRMs with High RP and Conserved Consensus GATA-1 Tend To Be Validated

  37. All validated preCRMs Same parameters All nonvalidated preCRMs Compare the outputs Consensus for EKLF binding site: C C N C M C C C W CCNCMCCCW CCNCMCCCW CACC box helps distinguish validated from nonvalidated preCRMs Ying Zhang

  38. Messages about evolutionary approaches to predicting regulatory regions • Regulatory regions are conserved, but not all to the same phylogenetic distance. • Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential). • Regulatory potential in combination with conservation of a GATA-1 binding motif is an effective predictor of enhancer activity. • In vivo occupancy by GATA-1 suggests other activities in addition to enhancers. • Comparison of polymorphism and divergence from closely related species can reveal regulatory regions that are under recent selection.

  39. preCRMs with conserved consensus GATA-1 binding sites are usually occupied by that protein: ChIP assay

  40. 50 50 100 Design of ChIP-chip for occupancy by GATA-1 • Non-overlapping tiling array with 50bp probe and 100bp resolution (NimbleGen) • Cover range Mouse chr7:57225996-123812258 (~70Mbp) 3. Antibody against the ER portion of GATA-1-ER protein in rescued G1E-ER4 cells Yong Cheng, with Mitch Weiss & Lou Dore (CHoP), Roland Green (NimbleGen)

  41. Signals in known occupied sites in Hbb LCR HS1 HS2 HS3 1) Cluster of high signals 2) “hill” shape of the signals

  42. Peak Finding Programs • TAMALPAIS Mark Bieda from Peggy Farmham’s lab Focus more on the cluster of the signals 4 thresholds based on number of consecutive probes with signals in the 98th or 95th percentiles • MPEAK Bing Ren’s lab Focus more one the “hill” shape of the signal 4 thresholds, for a series of probes with at least one that is 3, 2.5, 2 or 1 standard deviations above the mean

  43. ChIP-chip hits for GATA-1 occupancy Technical replicates of ChIP-chip with antibody against GATA1-ER Mpeak TAMALPAIS 275 hits in both 276 hits in both 59 216 60 321 total ChIP-chip hits

  44. ChIP-chip hits validate at a high rate Validation determined by quantitative PCR. 19 of the 321 hits were tested. 13 (~70%) were validated. ChIP DNA Validation rate is similar at different thresholds 9 regions were “hits” in only one of the two technical replicates. None were validated.

  45. Association of WGATAR and conservation with ChIP-chip Hits • 249 out of the 321 (78%) have WGATAR motifs, binding site for GATA-1 • Of the GATA-1 binding motifs in those 249 hits, 112 (45%) are conserved between mouse and at least one non-rodent species.

  46. Expected and unexpected ChIP-chip hits

  47. Distribution of ChIP-chip hits on 70Mb of mouse chr7 Yong Cheng, Yuepin Zhou and Christine Dorman

  48. Almost half the GATA-1 ChIP-chip hits increase expression of a transgene, K562 cells 15 6 6 No GATA-1 GATA-1 occupied sites by ChIP-chip 24 validated out of 56 fragments with ChIP-chip hits tested 43%

  49. Conserved, active Conserved, not active Not conserved, active Conserved and nonconserved ChIP-chip hits can be active as enhancers

  50. Messages about evolutionary approaches to predicting regulatory regions • Regulatory regions are conserved, but not all to the same phylogenetic distance. • Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential). • Regulatory potential in combination with conservation of a GATA-1 binding motif is an effective predictor of enhancer activity. • In vivo occupancy by GATA-1 suggests other activities in addition to enhancers. • Comparison of polymorphism and divergence from closely related species can reveal regulatory regions that are under recent selection.

More Related