Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Inside the Genome PowerPoint Presentation
Download Presentation
Inside the Genome

Inside the Genome

228 Vues Download Presentation
Télécharger la présentation

Inside the Genome

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Inside the Genome

  2. The club resident JD WatsonBack2back with DJ. Venter and 2001: The Human Genome International Human Genome Sequencing Consortium, Nature, 409: 860-921 (2001) Venter et. al. , Science 292:1304-1351 (2001)

  3. PrologueRNA word – the dark matter of genomics • How many coding genes in the human genome? • The Bet of 2000: • Mean 61710 • Range – 30,000 – 150,000 • By the end of the genome project the estimated number of human protein-coding genes declined to only ~25,000 • What is the source for that discrepancy? • ESTs based estimation Vs. Whole Genome annotation

  4. RNA revolution • The majority of the transcriptional output comes from non coding RNA • an average of 10% of the human genome (compared with ~1.5% exonic sequences) resulted in transcripts [Cheng et al. 2005] • Or even more...62% of the mouse genome is transcribed [FANTOM3: Science 2005]

  5. Various RNAs – A partial list… • messenger RNA (mRNA) • Ribosomal RNA (rRNA) • Transfer RNA (tRNA) • Small nuclear RNA (snRNA) • Small nucleolar RNA (snoRNA) • Short interfering RNA (siRNA) • Micro RNA (miRNA)

  6. Transcription Translation Protein RNA RNAs are not merely the intermediary cousins of proteins -The Central dogma of molecular biology Revisited Genome miRNA Regulation by proteins Regulation by RNA Transcriptome Proteome

  7. Research in Biology is complex… • Deciphering Biological Systems • The advantage (what makes this quest feasible) and the hindrance (what makes this quest inherently difficult) –both explained by evolution.

  8. The Hindrance – Topological Entanglement of functional interconnections • The difficulties in our research fundamentally owe their complexity to the designer – natural selection. • What is it - a “Robot” or a “UFO” ? • The reason lies in the profound difference between systems “designed” by natural selection and those designed by intelligent engineers[Langton 1989 Artificial Life].

  9. Bottom line:we investigate an outrageously complex weave of interconnections • The “textbook networks” represent only the tip of the iceberg. • miRNAs and “Regolomics” • microRNAs - Expected to represent ~1% of predicted genes [Lim et al., 2003] • Lewis et al., (2003) estimate average of five targets per miRNA • Many targets are transcription factors - miRNAs regulate the regulators

  10. The advantage – universal homology, thus enabling comparative biology. • Bottom line:the research in biology advances through a reductionist approach - using simple model organisms to infer functionality of homologous systems.

  11. Human genome statistics 2.91 billion base pairs 24,000 protein coding genes (>30,000 non-coding genes ???) 1.5% exons (127 nucleotides) 24% introns (~3,000 nucleotides) 75% intergenic (no genes) Repetitive elements rule (~ 45% dispersed repeat) Average size of a gene is 27,894 bases Contains an average of 8.8 exons*Titin contains 234 exons. Ave. of 4 diff. proteins per gene (alternative splicing)

  12. Detecting genes in the human genome Gene finding methods: • Ab initio use general knowledge of gene structure: rules and statisticsThe challenge: small exons in a sea of introns • Homology-based The problem: will not detect novel genes

  13. Genscan (ab initio) \\|// (o o) -. .-. .-oOOo~(_)~oOOo-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. .-. ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \||| ' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' `-' • Based on a probabilistic model of a gene structure • Takes into account:- promoters - gene composition – exons/introns- GC content- splice signals • Goes over all 6 reading frames Burge and Karlin, 1997, Prediction of complete gene structure in human genomic DNA, J. Mol. Biol. 268

  14. Splicing

  15. Eukaryotic splice sites Poly-pyrimidine tract

  16. CpG CpG Islands: another signal • CpG islands are regions of the genome with a higher frequency of CG dinucleotides (not base-pairs!) than the rest of the genome • CpG islands often occur near the beginning of genes maybe related to the binding of the TF Sp1

  17. cell nucleus Nuclear chromosome GeneOntology • GO describes proteins in terms of :biological process(e.g. induction of apoptosis by external signals)cellular component(e.g. membrane fraction)molecular function(e.g. protein kinase)

  18. Comparative proteome analysis Functional categories based on GO

  19. Comparative proteome analysis • Humans have more proteins involved in cytoskeleton, immune defense, and transcription

  20. Evolutionary conservation of human proteins ???

  21. Horizontal (lateral) gene transfer • Lateral Gene Transfer(LGT)is any process in which an organism transfers genetic material to another organism that is not its offspring

  22. Mechanisms: • Transformation • Transduction (phages/viruses) • Conjugation

  23. Bacteria to vertebrate LGT detection • E-value of bacterial homolog X9 better than eukaryal homolog Human query: Hit………………e-value Frog ………….. 4e-180 Mouse …………1e-164 E.Coli ………….. 7e-124 Streptococcus .. 9e-71 Worm ……………….0.1

  24. Bacteria to vertebrate LGT Non-vertebrates Bacteria vertebrates

  25. Bacteria to vertebrate LGT?? • Hundreds of sequenced bacterial genome vs. handful of eukaryotes • Gene finding in bacteria is much easier than in eukaryotes • On the practical side: rigid mechanical barriers to LGT in eukaryotes (nucleus, germ line)

  26. Repetitive Elements in the Human Genome

  27. Repeats statistics • The human genome is ~45% dispersed repeat • 20% LINEs, (AT rich) • 13% is SINES (11% Alu), (GC rich) • 8% LTR (retrovirus like) and • 2% DNA transposons • Another 3% is tandem simple sequence repeats (e.g. triplet) • And another 3-5% is segmentally duplicated at high similarity (over 1kb over 90% id) • Identifying and screening these out is essential to avoid fake matches

  28. LINEs and SINEs • Highly successful elements in eukaryotes • LINE - Long Interspersed Nuclear Element (>5,000 bp) • SINE - Short Interspersed Nuclear Element (< 500 bp) • SINEs are freeriders on the backs of LINEs –encode no proteins

  29. The C-value paradox • Genome size does not correlate with organism complexity

  30. Repetitive elements • The C-value mystery was partially resolved when it was found that large portions of genomes contain repetitive elements

  31. Are Alus functional?? • SINEs are transcribed under stress • SINE RNAs may bind a protein kinase  promote translation under stress Need to be in regions which are highly transcribed • Role in alternative splicing

  32. Segment duplications • 1077 segmental duplications detected • Several genes in the duplicated regions associated with diseases (may be related to homologous recombination) • Most are recent duplications (conservation of entire segment, versus conservation of coding sequences only)

  33. Genome-wide studies

  34. Sequenced genomes

  35. 481 segments > 200 bp absolutely conserved (100% identity) between human, rat and mouse

  36. Comparison with a neutral substitution rate • Compare the substitution rate in a any 1Mb region • Probability of 10-22 of obtaining 1 ultranconserved element (UE) by chance

  37. 481 UEs 100 intronic 111 UE overlap a known mRNA: exonic UEs 256 - no overlap (non-exonic) 156 inter-genic 114 - inconclusive

  38. Who are the genes? Type 1: exonic Type 2: genes which are near non-exonic UEs (???)

  39. Intergenic UEs • Genes which flank intergenic UEs are enriched for early developmental genes • Are UEs distal enhancers of these genes?

  40. Gene enhancer • A short region of DNA, usually quite distant from a gene (due to chromatin complex folding), which binds an activator • An activator recruits transcription factors to the gene

  41. Experimental studies of UEs Tested 167 UEs (both mouse-human UEs and fish-human UEs) for enhancer activity: cloned before a reporter gene to test their activity 45% functioned as enhancers

  42. A bioinformatic success • Ultraconservation can predict highly important function!

  43. BUT … Ahituv PLoS Biol. 2007 Sep;5(9):e234 Chose 4 UEs which are near specific genes:genes which show a specific phenotype when knocked-out Performed complete deletion of these UEs … the mice were viable and did not show any different phenotype

  44. Conclusions… • Ultraconservation can be indicative of important function • … • And sometimes not:- gene redundancy- long-range phenotypes- laboratories cannot mimic life