1 / 82

Comparative Genomics and Evolution

Comparative Genomics and Evolution. Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. Genome Research 18, 1743-1751 (2008).

biana
Télécharger la présentation

Comparative Genomics and Evolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. Genome Research 18, 1743-1751 (2008). Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. Genome Research 18, 1743-1751 (2008). Image source: http://mbbnet.umn.edu

  2. “Forces shaping the fastest evolving regions in the human genome” by Katherine S. Pollard et al.

  3. What’s the difference? Image sources: http://pro.corbis.com, http://www.science.psu.edu

  4. Humans have higher “brainpower” • Examples: creativity, problem solving, language • What part of the genome is the cause? What’s the difference? Image source: http://www.spaceflight.esa.int

  5. What’s the difference? • Human and chimpanzee DNA is 98% similar • The 2% difference is 29 million bases (mostly in non-coding DNA) Image source: http://en.wikipedia.org

  6. Human and rodent genomes are often compared to identify conserved (presumably functional) elements. • Humans and chimpanzees are compared to understand what is uniquely human about our genome. Comparative Genomics Image source: http://genome.ucsc.edu

  7. Look at HARs in human genome • HAR - human accelerated region. High rate of nucleotide substitution in humans, low in other vertebrates. • Fastest is HAR1 – novel RNA gene expressed in development of neocortex (language, conscious thought). Comparative Genomics

  8. ~ 100 bp, mostly non-coding • Function is likely to be gene regulation. • Seem to have been under strong negative selection up to common ancestor of chimp and human. • Rapid positive selection then started in humans only. HARs Image source: http://www.shutterstock.com

  9. Branch lengths given in substitutions per base, or in millions of years Evolution of vertebrates Finding HARs • Evolutionary tree based on the comparison of conserved regions in whole-genome alignments between species. Image from: Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

  10. Find HARs by using LRT, the likelihood ratio test. • In statistical hypothesis testing, the likelihood ratio (Λ) is the ratio of the maximum probability of a result under a null hypothesis and alternative hypothesis. • The LRT decides between the two hypothesis based on the value of the likelihood ratio. Finding HARs

  11. Two models were used for genomic LRT. • Model 1: human substitution rate is held proportional to the other substitution rates in the evolutionary tree. • Model 2: human substitution rate can be accelerated relative to the rates in the rest of the tree. Finding HARs

  12. . . . Human . . . Another vertebrate . . . . . . . . . Finding HARs All the conserved alignments

  13. Model 1 . . . Human . . . Another vertebrate . . . . . . . . . Finding HARs Determine 1st set of rates Determine 2nd set of rates Determine 3rd set of rates Scale all by the same amount

  14. Model 2 . . . Human . . . Another vertebrate . . . . . . . . . Finding HARs Scale all by the same amount Scale the human rates separately

  15. Identify regions conserved between human and other vertebrates (34,498 of them)

  16. Identify regions conserved between human and other vertebrates (34,498 of them) For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree Obtain P1 (max probability 1)

  17. Identify regions conserved between human and other vertebrates (34,498 of them) For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree Obtain P1 (max probability 1) Loop over all conserved regions. For each region, do:

  18. Identify regions conserved between human and other vertebrates (34,498 of them) For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree Obtain P1 (max probability 1) Loop over all conserved regions. For each region, do: Calculate LRT for the region as Λ = log(P2 / P1) Fit model 2 to the region in human, find acceleration for that region that maximizes the likelihood of the tree Obtain P2 (max probability 2)

  19. Big LRT value indicates an HAR. How big is big? • Do 1 million simulations of the 34,498 conserved alignments. • To create each simulation, use the model 1 proportional rates. • Repeat the LRT calculation for each simulation. • Then for each region, find proportion of simulated LRTs that are bigger than its original LRT. • That proportion is a p-value that tells if the region is an HAR. Finding HARs

  20. Note on methods: vertebrates that were used in selecting the conserved regions (chimp, macaque, mouse, rat, rabbit) were omitted from any LRT analysis. • This ensured that the LRT test is independent of the method used to select the conserved regions. Finding HARs

  21. Result: 202 HARs were found in the human genome. Finding HARs Image source: http://www.3dscience.com

  22. 80.4% of the 34,498 conserved regions are non-coding. • 45.4% of non-coding regions are intronic, 31% are intergenic, • Non-coding regions are enriched for transcription factors, DNA-binding proteins, regulators of nucleic acid metabolism Results for Conserved Elements

  23. 202 HARs have p < 0.1, 49 of them have p < 0.05 • HAR1 through HAR5 have p < 4.5e-4, very accelerated • Most HARs are non-coding • 66.3% are intergenic, 31.7% are intronic, only 1.5% are coding • Results support the hypothesis (King and Wilson) that most chimp-human differences are regulatory. Results for HARs

  24. Negative selection Positive selection Results: Confirming Accelerated Selection in HARs • Are the HARs just due to relaxation of negative selection? • No. Compare to neutral rate for 4D sites to see. Image source: http://cs273a.stanford.edu [Bejerano Aut 08/09]

  25. Genome-wide neutral rate for 4D sites in human and chimp in chromosome end bands Genome-wide neutral rate for 4D sites in human and chimp The chimp rates in all five elements fall well below the human rates, which exceed the background rates by as much as an order of magnitude. H, human; C, chimp. Image from: K.S. Pollard et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

  26. AT  GC substitution bias in HARs HAR1 – HAR5 AT  GC HAR6 – HAR49 GC  AT HAR50 – HAR202 Results: W  S Bias in HARs Rest of ~ 34000 conserved elements • Dramatic AT  GC bias was observed in HARs. Image from: Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

  27. Results: W  S Bias in HARs • Top 49 HARs are 2.7 times as likely to be located near final chromosomal bands as the other conserved elements • Interestingly, HAR1 and HAR5 are also in end regions in other mammals, but are not accelerated. Image source: http://www.intelihealth.com

  28. HARs tend to be located in regions of high recombination in humans. • All of this evidence points to biased gene conversion (BGC) as the driving force behind HARs. Results: W  S Bias in HARs

  29. Paired chromosomes can exchange homologous pieces • Typically occurs during meiosis Genetic Recombination

  30. Meiosis diploid germ cell paternal chromosome A maternal chromosome A

  31. Meiosis diploid germ cell paternal chromosome A maternal chromosome A DNA replication centromere sister chromatids

  32. Meiosis diploid germ cell paternal chromosome A maternal chromosome A DNA replication centromere sister chromatids Recombination

  33. Meiosis diploid germ cell paternal chromosome A maternal chromosome A DNA replication centromere sister chromatids Recombination Segregation

  34. Meiosis diploid germ cell paternal chromosome A maternal chromosome A DNA replication centromere sister chromatids Recombination Segregation haploid gametes

  35. Recombination hotspot Recombination

  36. duplex 1 duplex 2 Formation of Holliday Junction intermediate Horizontal resolution with gene conversion Vertical resolution with crossover Genetic Recombination Mismatch repair or Image source: http://www.sanger.ac.uk

  37. Homologous chromosomes Genetic Recombination: Chromosomal Crossover Recombinant chromatids • Chromosomal crossover results in exchange of DNA pieces Image source: http://www.emc.maricopa.edu

  38. Genetic Recombination: Gene Conversion Mismatch repair causes DNA to revert back to its original form Recombinant chromatids • Gene conversion results in nonreciprocal transfer of DNA Image source: http://www.emc.maricopa.edu

  39. Genetic Recombination: Gene Conversion haploid gametes • The result is a nonstandard ratio of alleles, such as 3:1 • This causes homogenization of a species’ gene pool Image source: http://www.emc.maricopa.edu

  40. A - T is a weak pairing Biased Gene Conversion G - C is a strong pairing • DNA repair machinery likes to replace weak pairings with strong pairings during gene conversion. Image source: http://commons.wikimedia.org

  41. Biased Gene Conversion Recombinant chromatids A – T replaced by G – C during mismatch repair • Biased gene conversion results in G – C enrichment of a species’ gene pool (in addition to causing homogenization)

  42. HARs tend to be located near recombination hotspots in humans HARs and Recombination Hotspots

  43. Mysterious • Extremely different between chimps and humans (change rapidly during evolution) • Not caused by the local DNA sequence (it is the same in human and chimp) Recombination Hotspots

  44. Recombination hotspots ? Some HARs

  45. Recombination-caused BGC (often seen negatively) played a big role in the development of our species. Possible Conclusion

  46. HAR HAR Isochore Alternative Explanation • Isochore – DNA region (~100 kb) with high gene concentration • Isochores are stabilized by many strong (GC) pairings

  47. Theory (Bernardi et al.) that weakly deleterious changes drive isochore to a critical point of destabilization • At critical point, GC content cannot decrease – otherwise isochore becomes unstable • AT  GC substitution in the isochore suddenly gains selective advantage and sweeps through the population Alternative Explanation

  48. Isochore selective sweep theory vs. the BGC theory. • Isochore sweep has a different DNA signature than BGC Isochore selective sweep GC GC GC GC GC GC GC Alternative Explanation ~ 100 kb Biased gene conversion GC GC GC GC GC GC GC ~ 100 bases

  49. Evidence so far favors the BGC explanation for HARs • However, the results are not yet conclusive Alternative Explanation

  50. “Dispensability of Mammalian DNA” by Gill Bejerano and Cory McLean

More Related