Junk DNA and DNA editing Shai Carmi Bar-Ilan, BU מוצ"ש י"ג אייר 17/05/2008
Genome structure • DNA has mostly evolved to store the code of the proteins its host cell is using. • Thus, the main functional units of any genome are protein coding genes. • The central dogma of molecular biology: DNA→RNA → Protein 5’ 3’ A,C,G,T A,C,G,U Final product: Proteins are the cellular machinery 20 amino acids
מותר האדם מן הבהמה אין? • In human, protein coding sequences are only 2% of the genome. • All animals have the same order of magnitude of genes (few tens of thousands). • Does non-coding DNA determines complexity? • Is everything else junk?
Non-coding DNA • The rest codes for introns, promoters and enhancers (regulation of expression), structural sequences (e.g. telomeres), non-coding RNAs such as rRNA and tRNA (translation), micro-RNA (silencing), snRNA (splicing). • But this is not all! • Almost HALF of the human genome is made of mobile elements. • Pieces of ~100-10k base pairs moving around the genome in a cut&paste or copy&paste mechanisms.
DNA transposons • DNA transposons: cut&paste using the enzyme transposase(3% of the genome). • Sometimes transfers also host sequences. • Increases the genome volume only through repeats at the edgesor if happens during S-phase.
Retrotransposons • Retrotransposons: copy&paste mechanism through RNA intermediate. • Main classes: • LTR (retrovirus like, 8.7% of the genome). • LINE (Long interspersed nuclear elements, 21.3%). • SINE (Short interspersed nuclear elements, 13.6%). • Retrotransposons behave like retroviruses. • What are retroviruses?
Retroviruses • Retroviruses are pieces of (ss) RNA (DNA in other viruses) wrapped in a capsid and envelope. • They penetrate into the cell, and use the cell machinery to replicate, assemble a new virus, and infect another cell. • Example: HIV. Few thousand bases
Retroviral proteins (advanced) • Pol: Encodes a polyprotein with- • protease (cleavage of the retrovirus proteins). • Reverse transcriptase (copy the RNA to DNA). • RnaseH (degradation of RNA after reverse transcription). • Integrase (integration of the DNA into the genome). • Gag: Codes for core and structural proteins of the virus. • Env: Glycoprotein that recognizes membrane receptors of the host cell and initiate the process of infection. • Complex splicing pattern, with partial overlap and frameshifting
Retrotransposons • It is commonly believed that ancient retroviral infection in the germ line is the origin of nowadays retrotransposons. • How did they occupy 40% of the genome? • Transcription: genomic DNA→RNA. • Translation of viral proteins (if possible). • Reverse transcription: RNA → DNA by reverse transcriptase. • Insertion into new genomic locations, increasing the number of genomic copies of the sequence. • Mobile elements are like double edge sword. RETRO: violating the central dogma!
Why are retrotransposons good? • Serve as reservoir of sequences for genetic innovation. • Retroviral proteins have DNA binding capabilities which can be exploited by the host cell. • Regulate expression levels of existing genes. • Change gene regulation networks: • By copying a promoter, two sequences are controlled by the same transcription factors (or in other cases by RNA binding proteins or miRNA).
Why are retrotransposons bad? • Retroelements generate mutations, through direct insertion into genes, or unequal homologous recombination. • Responsible to 0.3-0.5% of all genetic disorders (e.g. hemophilia). • Change the normal transcription of the gene (alter promoter activity, anti-sense transcription, silencing via methylation or miRNA binding). • Alternative splicing and protein isoforms.
Examples How can we stop them ???
Inhibition of retroelements Few mechanisms exist: • Accumulation of mutations results in non-autonomous elements. • Methylation and heterochromatin formation attenuates transcription (LINE). • RNA interference. • DNA editing(more to come). • Did we succeed? • Probably we did: • Here we are, more complex than any other organism. • Most elements are inactive–only Alu and L1 are active with insertion once in 100 births.
Basics of DNA editing • The APOBEC3 family of proteins was found to restrict retroviral replication. One of its mechanisms of operation is by “Cytosine Deamination of the (-) strand DNA strand after reverse transcription”. Meaning… • APOBEC catalyzes some chemical modification of the DNA just before it is integrated into the genome, eventually generating G→A mutation (editing). • (localization varies nucleus/cytoplasm). • Inducing tens/hundreds of mutations (uracil excision?). • Editing itself is not sufficient to stop replication- other mechanisms are also used. Useful in the immune system to generate new antibodies!
Evolution of APOBEC • APOBEC3G is one of the most positively selected genes (=changes the fastest). • Ongoing arms race with HIV. • In response to APOBEC, HIV developed the Vif protein that can ubiquitinate APOBEC (=send it to “recycle” (proteasome)). • Different APOBECs restrict retroviruses/transposons in different mechanisms (e.g., binding to RNA and blocking reverse transcription).
DNA editing in the genome • Some retrotransposons were edited by APOBEC, but yet integrated into the genome. • New mechanism of mutagenesis. • So far, almost neglected by geneticists. • Together with Erez Levanon, HMS. • Analyzed retroelements in mouse, human and chimp,applying new statistical approach.
Main results • Editing has fingerprints in thousands of mouse IAP/MusDretroelements, with distinguished motifs. • Predicting hundreds of thousands editing sites. • Edited IAPs are transcribed more than non-edited. • Some edited IAPs overlap with introns and exons. • Phylogenetic tree can be changed if considering editing information. • Editing also in non-LTR, LINE mouse elements. • Editing in human and chimp HERV retroelements.
DNA editing demonstration • Comparing two mouse IAPs. • chr9:114987516-114993954chr8:28575443-28581824 • One cluster of 68 consecutive G→A! • Total 176/202 G→A mismatches. Can editing accelerate evolution? Easily available raw material for the generation of new functions! (for example: any editing in TGG creates premature stop codon).
DNA editing phylogenetics Same tree, masking the editing. Automatically generated genetic tree. If two sequences are the same except for G→A mutation, the sequence with ‘G’ must precede the one with ‘A’. Thus we can build the tree of elements. Editing affects phylogenetics!
Summary • Significant fraction of the DNA originates from infection by ancient RNA viruses, spreading through the genome by reverse transcription and replication. • Some of them ‘domesticated’ to benefit the host cell (not really junk!), but some induce deleterious mutations. • One of the mechanisms to restrict retrotransposition is editing them before integration into the genome. • Many genomic sites are ‘edited’ due to this restriction activity. • New mechanism of mutagenesis, potentially leading to evolution of new molecules or function (for example, HIV drug resistance).