1 / 45

Inside the Genome

Inside the Genome. The club resident JD Watson Back2back with DJ. Venter and. 2001: The Human Genome. International Human Genome Sequencing Consortium, Nature, 409: 860-921 (2001). Venter et. al. , Science 292:1304-1351 (2001). Prologue RNA word – the dark matter of genomics.

corbett
Télécharger la présentation

Inside the Genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inside the Genome

  2. The club resident JD WatsonBack2back with DJ. Venter and 2001: The Human Genome International Human Genome Sequencing Consortium, Nature, 409: 860-921 (2001) Venter et. al. , Science 292:1304-1351 (2001)

  3. PrologueRNA word – the dark matter of genomics • How many coding genes in the human genome? • The Bet of 2000: • Mean 61710 • Range – 30,000 – 150,000 • By the end of the genome project the estimated number of human protein-coding genes declined to only ~25,000 • What is the source for that discrepancy? • ESTs based estimation Vs. Whole Genome annotation

  4. RNA revolution • The majority of the transcriptional output comes from non coding RNA • an average of 10% of the human genome (compared with ~1.5% exonic sequences) resulted in transcripts [Cheng et al. 2005] • Or even more...62% of the mouse genome is transcribed [FANTOM3: Science 2005]

  5. Various RNAs – A partial list… • messenger RNA (mRNA) • Ribosomal RNA (rRNA) • Transfer RNA (tRNA) • Small nuclear RNA (snRNA) • Small nucleolar RNA (snoRNA) • Short interfering RNA (siRNA) • Micro RNA (miRNA)

  6. Transcription Translation Protein RNA RNAs are not merely the intermediary cousins of proteins -The Central dogma of molecular biology Revisited Genome miRNA Regulation by proteins Regulation by RNA Transcriptome Proteome

  7. Research in Biology is complex… • Deciphering Biological Systems • The advantage (what makes this quest feasible) and the hindrance (what makes this quest inherently difficult) –both explained by evolution.

  8. The Hindrance – Topological Entanglement of functional interconnections • The difficulties in our research fundamentally owe their complexity to the designer – natural selection. • What is it - a “Robot” or a “UFO” ? • The reason lies in the profound difference between systems “designed” by natural selection and those designed by intelligent engineers[Langton 1989 Artificial Life].

  9. Bottom line:we investigate an outrageously complex weave of interconnections • The “textbook networks” represent only the tip of the iceberg. • miRNAs and “Regolomics” • microRNAs - Expected to represent ~1% of predicted genes [Lim et al., 2003] • Lewis et al., (2003) estimate average of five targets per miRNA • Many targets are transcription factors - miRNAs regulate the regulators

  10. The advantage – universal homology, thus enabling comparative biology. • Bottom line:the research in biology advances through a reductionist approach - using simple model organisms to infer functionality of homologous systems.

  11. Human genome statistics 2.91 billion base pairs 24,000 protein coding genes (>30,000 non-coding genes ???) 1.5% exons (127 nucleotides) 24% introns (~3,000 nucleotides) 75% intergenic (no genes) Repetitive elements rule (~ 45% dispersed repeat) Average size of a gene is 27,894 bases Contains an average of 8.8 exons*Titin contains 234 exons. Ave. of 4 diff. proteins per gene (alternative splicing)

  12. Detecting genes in the human genome Gene finding methods: • Ab initio use general knowledge of gene structure: rules and statisticsThe challenge: small exons in a sea of introns • Homology-based The problem: will not detect novel genes

  13. Genscan (ab initio) \\|// (o o) -. .-. .-oOOo~(_)~oOOo-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \||| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' • Based on a probabilistic model of a gene structure • Takes into account:- promoters - gene composition – exons/introns- GC content- splice signals • Goes over all 6 reading frames Burge and Karlin, 1997, Prediction of complete gene structure in human genomic DNA, J. Mol. Biol. 268

  14. Splicing

  15. Eukaryotic splice sites Poly-pyrimidine tract

  16. CpG CpG Islands: another signal • CpG islands are regions of the genome with a higher frequency of CG dinucleotides (not base-pairs!) than the rest of the genome • CpG islands often occur near the beginning of genes maybe related to the binding of the TF Sp1

  17. cell nucleus Nuclear chromosome GeneOntology • GO describes proteins in terms of :biological process(e.g. induction of apoptosis by external signals)cellular component(e.g. membrane fraction)molecular function(e.g. protein kinase)

  18. Comparative proteome analysis Functional categories based on GO

  19. Comparative proteome analysis • Humans have more proteins involved in cytoskeleton, immune defense, and transcription

  20. Evolutionary conservation of human proteins ???

  21. Horizontal (lateral) gene transfer • Lateral Gene Transfer(LGT)is any process in which an organism transfers genetic material to another organism that is not its offspring

  22. Mechanisms: • Transformation • Transduction (phages/viruses) • Conjugation

  23. Bacteria to vertebrate LGT detection • E-value of bacterial homolog X9 better than eukaryal homolog Human query: Hit………………e-value Frog ………….. 4e-180 Mouse …………1e-164 E.Coli ………….. 7e-124 Streptococcus .. 9e-71 Worm ……………….0.1

  24. Bacteria to vertebrate LGT Non-vertebrates Bacteria vertebrates

  25. Bacteria to vertebrate LGT?? • Hundreds of sequenced bacterial genome vs. handful of eukaryotes • Gene finding in bacteria is much easier than in eukaryotes • On the practical side: rigid mechanical barriers to LGT in eukaryotes (nucleus, germ line)

  26. Repetitive Elements in the Human Genome

  27. Repeats statistics • The human genome is ~45% dispersed repeat • 20% LINEs, (AT rich) • 13% is SINES (11% Alu), (GC rich) • 8% LTR (retrovirus like) and • 2% DNA transposons • Another 3% is tandem simple sequence repeats (e.g. triplet) • And another 3-5% is segmentally duplicated at high similarity (over 1kb over 90% id) • Identifying and screening these out is essential to avoid fake matches

  28. LINEs and SINEs • Highly successful elements in eukaryotes • LINE - Long Interspersed Nuclear Element (>5,000 bp) • SINE - Short Interspersed Nuclear Element (< 500 bp) • SINEs are freeriders on the backs of LINEs –encode no proteins

  29. The C-value paradox • Genome size does not correlate with organism complexity

  30. Repetitive elements • The C-value mystery was partially resolved when it was found that large portions of genomes contain repetitive elements

  31. Are Alus functional?? • SINEs are transcribed under stress • SINE RNAs may bind a protein kinase  promote translation under stress Need to be in regions which are highly transcribed • Role in alternative splicing

  32. Segment duplications • 1077 segmental duplications detected • Several genes in the duplicated regions associated with diseases (may be related to homologous recombination) • Most are recent duplications (conservation of entire segment, versus conservation of coding sequences only)

  33. Genome-wide studies

  34. Sequenced genomes

  35. 481 segments > 200 bp absolutely conserved (100% identity) between human, rat and mouse

  36. Comparison with a neutral substitution rate • Compare the substitution rate in a any 1Mb region • Probability of 10-22 of obtaining 1 ultranconserved element (UE) by chance

  37. 481 UEs 100 intronic 111 UE overlap a known mRNA: exonic UEs 256 - no overlap (non-exonic) 156 inter-genic 114 - inconclusive

  38. Who are the genes? Type 1: exonic Type 2: genes which are near non-exonic UEs (???)

  39. Intergenic UEs • Genes which flank intergenic UEs are enriched for early developmental genes • Are UEs distal enhancers of these genes?

  40. Gene enhancer • A short region of DNA, usually quite distant from a gene (due to chromatin complex folding), which binds an activator • An activator recruits transcription factors to the gene

  41. Experimental studies of UEs Tested 167 UEs (both mouse-human UEs and fish-human UEs) for enhancer activity: cloned before a reporter gene to test their activity 45% functioned as enhancers

  42. A bioinformatic success • Ultraconservation can predict highly important function!

  43. BUT … Ahituv PLoS Biol. 2007 Sep;5(9):e234 Chose 4 UEs which are near specific genes:genes which show a specific phenotype when knocked-out Performed complete deletion of these UEs … the mice were viable and did not show any different phenotype

  44. Conclusions… • Ultraconservation can be indicative of important function • … • And sometimes not:- gene redundancy- long-range phenotypes- laboratories cannot mimic life

More Related