IB404 - 17 - Human genome 3 – Mar 26

IB404 - 17 - Human genome 3 – Mar 26 1. Transposons, transposable elements, or jumping genes comprise half of our genome so it is necessary that we learn something about them. They come in two major flavors, those that move by a RNA intermediate, also known as RNA or retro-transposons or Class I elements, and those that move as a DNA molecule, also known as DNA transposons or Class II elements. 2. Retrotransposons are the major class in mammalian genomes. They in turn come in three major kinds. The first are retroviral-like transposons, and as their name implies they have 6-10 kbp genomes resembling those of retroviruses such as HIV. Thus they encode a reverse-transcriptase that copies their RNA genome into a DNA copy, an integrase that integrates this DNA copy into our chromosomes, a Gag protein that forms a capsule around the complex of RNA, RT, and integrase, and a protease that cleaves these other three proteins from a long precursor protein. There are now anti-HIV drugs that target each of these proteins. Unlike true retroviruses that can cross infect between animals, they do not encode an envelope protein that is part of the membrane surrounding an HIV virion. The coding region is flanked by Long Terminal Repeats of 500- 2000 bp, and these LTRs help define these transposons. Similar transposons are found in most genomes, e.g. the Gypsy and Copia transposons and many others in Drosophila, and the Ty1 and 2 retrotransposons in yeast.

3. Non-LTR retrotransposons are the second major kind, and as their name implies, do not have the LTRs at their ends. They encode only the reverse transcriptase and an endonuclease. At their 3’ ends they typically have a long A-rich stretch. Copies of these kinds of transposons in the genome often have their 5’ ends truncated, because the mechanism by which they are integrated into the genome is combined with their reverse-transcription, and if the latter is interrupted, then they lose their 5’ ends. This figure is for a particular Drosophila element called R2 that specifically integrates into the 28S rDNA repeats. 4. Again, non-LTR retrotransposons are found in most animal genomes, but their prevalence in the human genome is extraordinary. A single type, known as LINE1 for Long Interspersed Nuclear Element 1, has a full length of about 8 kbp, and makes up about 20% of our DNA. It is not hard to imagine how a few master copies of such an element, when abundantly transcribed, could lead to the integration of many copies throughout the genome. Our genome appears to have suffered numerous such events in the past couple hundred million years, and movement of these LINEs is on-going.

5. The third major kind are the SINEs or Short Interspersed Nuclear Elements. In the human genome this kind is best represented by the Alu element, so-called because the bacterial restriction enzyme Alu recognizes target sites near the ends of these element and in raw digests of human DNA these elements can be seen as a ~300 bp band. These do not encode any proteins, but have an A-rich 3’ end. There are more than a million copies in our genome, and they appear to have been produced from master copies in bouts over the past hundred million years, and again some are still actively moving in us each generation. Thus some copies are highly diverged from their consensus, because they were produced from a master copy long ago and have mutated individually since then, while others are very similar to their consensus, implying that they were formed relatively recently. As with the LINE1 element, this process is ongoing, and indeed polymorphic Alu insertions have been discovered in humans, sometimes defining particular lineages, eg. native North Americans. 6. The Alu element is thought to have arisen as a duplication of two copies of the signal recognition particle component7S non-coding RNA (recall all the pseudogene copies of these and other non-coding RNAs that carry their RNA Pol III promoter sequences with them). Like these and other retropseudogenes, Alus are thought to result from the action of the reverse transcriptase of LINE1 elements in our genome, although if this is the case then Alu has achieved a particularly efficient way to “parasitize” the LINE1 system. Strangely, LINE1 elements usually integrate into AT-rich portions of our genome and comprise most of the long AT-rich deserts, while the relatively GC-rich Alu elements usually integrate into GC-rich regions of the genome. It is not understood how Alu elements manage to bias the integration preference of the LINE1 system this way, but Alus are clearly far more abundant in gene-rich regions of the genome (revisit slide 6, point 15 from last lecture and the associated figure to see this trend).

7. An example of a chromosomal regions showing GC and AT isochores. This is the 4p22-4p15 region of the 4th chromosome. Note the isochores of high GC content enriched in SINES (Alus) and genes, contrasted with isochores of low GC content enriched for LINEs but with only a few genes. These genes in the AT-rich deserts are often very large (not shown here).

8. The DNA-mediated elements are particularly diverse in the human genome, as they are elsewhere. Most transposons in bacterial genomes are of this kind, and famous ones in Drosophila include the P element used for transformation and mutagenesis (I did my postdoc in Madison on it, and then spent the first 15 years here working on mariner transposons). The various kinds all share a common structure, with a single gene encoding a transposase, flanked by inverted terminal repeat (ITR) sequences. Each different kind encodes a different kind of transposase, although many of these are distantly related to each other, while also having different sequence and length ITRs. For example, the mariner family of transposons I studied generally are 1.3 kb long, with 30 bp ITRs and a ~1000 bp ORF encoding a ~300 amino acid transposase. This is one of the smaller kinds, with others reaching several kbp long. The basic mechanism of transposition is that the transposase protein encoded by each element recognizes the ITRs of a copy, brings them together, cleaves them from the flanking DNA, cleaves a suitable target elsewhere in the genome, and inserts the actual DNA molecule of the transposon into the target. Often the host is fooled into replicating the original element position from the sister chromatid. Transposase ORF 9. Transposons are recognized as belonging to these various classes and kinds based on sequence similarity and structures, however that only works for recently formed copies (in the past few million years). For older copies it is necessary to generate consensus sequences that represent what they once looked like (up to about 200 Myr ago), while eventually the sequences of individual copies change so much that it becomes impossible to recognize them as being derived from a transposon. Roughly 45% of our genome can currently be recognized as being transposons, but perhaps as much as another 20% is such ancient transposon copies that they can no longer be recognized as such.

10. This is a summary table of the transposon content of our genome, taken from the public paper, where they extensively discuss these. Celera’s WGS assembly stumbled on a lot of these and did not represent them well, leaving gaps with NNNNNNNNNs instead, so they did not treat them much. After all, they were only after the genes. While the non-autonomous versions of some transposons are simply internally deleted versions of them that sometimes outnumber the normal copies, remember that the SINEs are not simply versions of LINES, rather they evolved separately albeit being apparently dependent on LINEs for activity. Amongst the DNA transposon fossils are one kind of P element, and three kinds of mariners, plus many other kinds. But as far as we can tell, none of the DNA transposons are still active, so they truly are molecular fossils, remnants of horizontal transfers from other species.

11. One can break down the copies of transposons into age classes according to their percent divergence from their consensus, with roughly 4% divergence representing 25 Myr. The remarkable result is that the youngest DNA transposons in our genome are about 50 Myr old, while the youngest LTR retrotransposons are about 25 Myr. LINE elements appear to have been continuously active in our genome, while SINES show an explosion of activity in the past 100 Myr, although both these classes have also become relatively quiescent in the past 10 Myr.

12. While there is no question that transposons are primarily selfish genetic elements making copies of themselves at the expense of the host, and most RNAi systems and other host genome protection systems like the RIP system of Neurospora crassa appear to have evolved as defenses against them, transposons occasionally become useful to their hosts. A classic example is that flies do not have the normal short telomeric repeats generated by telomerase, instead their telomeres are maintained by the faithful transposition of two kinds of LINEs into them. In our genome the most famous example is the RAG genes that encode the two recombinases that help generate the diversity of antibodies in B cells and receptors in T cells of our adaptive immune system. These appear to have been derived from a transposon perhaps 450 Myr ago when we were cartilaginous fish. 13. The public paper recognized about 40 additional “domesticated” transposon copies that are now functional genes in our genome. We worked on several of these, including seven derived from various Tigger DNA transposons in our genome, but their functions are not known. The SETMAR gene is particularly interesting as it is a chimeric or fusion gene resulting from an exon encoding a SET domain (which is involving in methylation of the lysines in histones and hence the “histone code” that controls which genes are available for transcription), and an exon encoding a transposase domain from a mariner transposon about 50 Myr ago.

IB404 - 17 - Human genome 3 – Mar 26

IB404 - 17 - Human genome 3 – Mar 26

Presentation Transcript

RNAs in the human genome

break

ENCODE 2012

Bias in Studies of the Human Genome

Human Genetics

9 Genomics and Beyond

Sequence Comparison and Genome Alignment in the Human Genome

The Human Genome Project

The Human Genome Project

Organization of the human genome

The mitochondrial genome

Introduction to genomes

Human Genome Project

Aim: What have we learned from the Human Genome Project ?

The Human Genome Project: Effects on Human Health

TOPICS IN (NANO) BIOTECHNOLOGY Human Genome Project Lecture 12

HUMAN GENOME

The Human Genome Project

NCBI Human Genome Resources

TOPICS IN (NANO) BIOTECHNOLOGY Human Genome Project Lecture 12

Human Genome Project