Protein Motifs-RNA Binding Domains

Protein Motifs-RNA Binding Domains 21/02/2008

RNA Binding proteins • Typically cytoplasmic or nuclear • At least 9 families RNA binding proteins identified but few have had their structures experimentally determined (those that have are mainly ribosomal proteins) • Can bind to single-stranded RNA or double- stranded RNAs • The 2 most abundant and well defined RNA binding motifs are: 1) RNA-Recognition Motif (RRM) which binds single stranded RNA (AKA RDB) 2) Double stranded RNA binding motif (dsRBM) which binds double stranded RNA • Many regulate translation of RNA and post-translational events eg RNA splicing and editing • dsRBM plays an important role in regulatory interactions mediated by siRNAs and miRNAs • Other examples include:translation initiation factors, poly-A binding proteins, snRNPs, ADAR

RNA Binding proteins • Many RNA binding proteins contain a putative RNA-binding domain of ~90 amino acids whose amino acid sequence is conserved from yeast to humans. • The most highly conserved motif within the RNA binding domain is the ‘ribonucleoprotein consensus sequence’ (RNP-CS). This is the identifying feature of this group of proteins. • Often RNA binding proteins contain several similar (non-identical) RNP-CS-type RNA binding domains • Also contain at least one auxiliary domain which mediates protein-protein interactions (unique to each type of protein) • A large number of proteins also contain shorter sequence motifs, including the arginine-rich motif (ARM) and the Arg-Gly-Gly (RGG) box

RNA binding motif properties • Arginine rich motifs are abundant in RNA binding sites and secondly lysine rich motifs (and other positively charged amino acids). Together arginine and lysine account for 32% of interface residues • Positively charged amino acids are able to interact with both bases and the negatively charged phosphate backbone of RNA. • Histidine is also common, as it can be positively charged and can form stacking interactions through its imidazole ring • Phenylalanine and negatively charged amino acids (glutamic acid and aspartic acid) are under-represented at binding interfaces as are the hydrophobic residues (leucine, isoleucine, valine and alanine)

RNA binding motif properties (2) • There are significant biases in the types of amino acids that are sequence neighbours of interface residues, eg Glycine is preferred on either side of an interface residue (small in size, allowing flexibility in folding conformations) • In a study by Terribilini et al it was found 95% interface residues have at least one additional interface residue among the four amino acids on either side and 49% have at least four, ie interface residues tend to cluster in the primary amino acid sequence rather than being brought into close proximity by folding

ribonucleoprotein consensus sequence, RNP-CS • Initially identified in yeast mRNA poly(A)-binding protein (PABP) and mammalian heterogenous nuclear riboprotein particle (hnRNP) protein A1 • The RNP-CS is an octapeptide, consensus sequence Lys/Arg-Gly-Phe/Tyr-Gly/Ala-Phe-Val-X-Phe/Tyr • Most highly conserved segment in a well conserved region • Defines only one class of RNA binding protein –many others exist (eg ribosomal proteins) which do not contain this motif • Other amino acids in the RNP-CS-type RNA binding domain are also conserved eg. RNP- 2 (6 amino acids) located ~30 amino acids amino-terminal to the RNP-CS. RNP-2 is less well conserved than RNP-CS. • Similarities have been reported between eukaryotic RNP-CS and several prokaryotic ssDNA-binding proteins (including T4 gp32, M13 gp5 and E.coli ssb) and bacterial transcription factors.

Modular structure of RNP-CS-type proteins • Several RNP-CS-type RNA-binding domains often occur in an individual protein and are located in the amino terminal portion. • Comparison of RNA binding domains within a single protein typically show ~30% amino acid identity, while comparison of corresponding domains in the same protein from divergent species show strong amino acid identity, >60%. • Corresponding domains of yeast and human PABPs contain segments of 8-10 amino acids that are completely identical-indicative of functional specialization. • Evolutionary conservation among the domains suggest they evolved by duplication and diversification of a common ancestral RNA-binding protein gene. • The remainder of RNP-CS-type protein is characterised by auxiliary domains, (involved in protein-protein interactions) and sometimes also influence RNA binding activity. It is this combination of auxiliary domains that give RNP-CS-type proteins their modular structure • RNP-CS-type proteins form multiprotein complexes.

The structure of PTB-C198, a hnRNP protein, a homodimer. There are 2 RNP-CS domains and one of them has a novel fold (blue?!)

KH domains • The K homology (KH) domain was first identified in the human heterogeneous nuclear ribonucleoprotein (hnRNP) K. • An evolutionarily conserved sequence of around 70 amino acids, the KH domain is present in a wide variety of nucleic acid-binding proteins. The KH domain binds RNA, and can function in RNA recognition. It is found in multiple copies in several proteins, where they can function cooperatively or independently. For example, in the AU-rich element RNA-binding protein KSRP, which has 4 KH domains, KH domains 3 and 4 behave as independent binding modules to interact with different regions of the AU-rich RNA targets. The solution structure of the first KH domain of FMR1 and of the C-terminal KH domain of hnRNP K determined by nuclear magnetic resonance (NMR) revealed a beta-alpha-alpha-beta-beta-alpha structure. Proteins containing KH domains include: • Archaeal and eukaryotic exosome subunits. • Eukaryotic and prokaryotic RS3 ribosomal proteins. • Vertebrate fragile X mental retardation protein 1 (FMR1). • Vigilin, which has 14 KH domains. • AU-rich element RNA-binding protein KSRP. • hnRNP K, which contains 3 KH domains. • Human onconeural ventral antigen-1 (NOVA-1).

KH domain structure • A, schematic diagram of the domain structure of human PCBP2. Similar domain structures are observed in other members of the PCBP family. B, sequence alignment of the KH domains from the known PCBP proteins PCBP1-4 and hnRNP-K.

The FMR1 gene and its KH mutation • In most fragile X patients expansion of the CGG repeat tract causes silencing of the FMR1 gene and absence of the FMRP protein. However, over expression of the FMRP protein containing a missense mutation (p.I304N) has been observed in a patient with a severe fragileX phenotype • The mutation affected a highly conserved amino acid in the first turn of helix α2, in the second KH domain of FMRP. Amino acid I304 has a role in stabilising the protein fold. The mutation destabilised the fold resulting in the unfolding of the domain and impairment of RNA binding. • The mutant protein cannot associate with translating poly-ribosomes and becomes part of a smaller ribonucleoprotein complex. The I304N mutant also inhibits oligomerization of FMRP suggesting the second KH domain has a role in homo-association.

Initiation and the poly(A)-binding protein (PABP) • Initiation is a rate limiting step in translation (target for control). It is a multistage process which facilitates recruitment of ribosomes to the mRNA and involves translation factors and adapter proteins. • All eukaryotic mRNAs of nuclear origin contain a 5’ cap and most contain a polyA tail which are recognised by proteins and facilitate translation. • many copies of the poly(A)-binding protein (PABP) bind to the poly(A) tail and stabilize the mRNA (also involved in translation) PABP • 636 amino acid protein with four RRMs and a proline rich C-terminal domain -which mediates protein-protein interactions • Highly conserved protein (mammals, plants and yeast) and interacts with eIF4G to circularise mRNA and stimulate translation

Post-transcriptional suppression of microRNAs and small interfering RNAS Pri-mRNAs are first processed into ~70 nucleotide pre-mRNAs by DROSHA inside the nucleus. Pre-mRNAs are transported into the cytoplasm by exportin 5 and are processed into miRNA:miRNA duplexes by Dicer. Dicer also processes long dsRNA molecules into small interfering RNA (siRNA) duplexes. Only one strand of miRNA:miRNA duplex or siRNA duplex is preferentially assembled into the RNA-induced silencing comlex RISC, which acts on its target by translational repression or mRNA cleavage, depending on the level of complementarity between the small RNA and its target open reading frame.

RISC • RNA-induced silencing complex • Multiprotein siRNA complex which cleaves dsRNA and binds short antisense RNA strands which are then able to bind complementary strands. When it finds a complementary strand it activates RNAse activity and cleaves the RNA preventing its translation into protein • Important in gene regulation by micro-RNAs and in defense in viral infection when dsRNA is the infectious agent

Dicer (and the PAZ domain) • Dicer is a ribonuclease from the RNase III family • Cleaves dsRNA and pre-microRNA(miRNA) into short RNA fragments called siRNA (20-25 nucleotides long) • Initiates formation of RISC • Encoded by DICER1 gene • Contains two RNase III domains and one PAZ domain PAZ (Piwi/Argonaute/Zwille) domains consist of a left handed helix and a six stranded β-barrel capped at one end by two α helices and wrapped on one side by an appendage (a long ß hairpin and a short α-helix). The RNA binding surface is the open face of the ß-barrel (contains amino acids conserved within the PAZ domain family) • PAZ domains from different human Dicer and argonaute proteins show conserved function (and therefore structure)

Structural features of RNase III and Dicer

Dicer PAZ domain

Argonautes • RNase H-like enzymes that carry out the catalytic functions of RISC • Bind small interfering RNA (siRNA) generated from dsRNA or miRNA generated from endogenous non-coding RNA by Dicer • Contain a PAZ domain PIWI • P-element induced wimpy testis (in Drosophila) • Class of regulatory proteins which have a germ-line function by interacting with miRNAs involved in early development • Conserved basic binding site for the 5’ end of bound RNA

Drosha • RNase III enzyme responsible for processing of miRNAs or short RNA molecules by interacting with RISC to induce cleavage of complementary mRNA • Exists as part of a protein complex called the microprocessor complex which also contains the double stranded RNA binding protein Pasha

miRNAs and cancer • miRNAs located in genomic regions amplified in cancers (eg miR-17-92 cluster) function as oncogenes whereas miRNAs located in regions deleted in cancers (eg miR-15a-miR-16-1 cluster) act as tumour suppressors • miRNA expression studies correlate with clinical and biological characteristics of tumours, including tissue type, differentiation, aggression and response to therapy • Each miRNA has many targets therefore inherited variations in miRNA expression or proteins involved in their post-transcriptional suppression (exportin 5, Dicer, RISC), could have consequences for expression of protein-encoding genes involved in malignant transformation

Diseases caused by mutations in RNA-binding protein genes

references • Nature revies, microRNA collection, October 2007 • Terribilini et al, Bioinformatics, 2006, 12:1450-1462 • Yan et al, Nature vol 426, 27 November 2003 • Wikipedia • Brandziulis et al, Genes and development, 1989, 3:431-437 • Kahvejian et al, Genes and development, 2005, 19:104-113 • Ramos et al, RNA, 2003 9:293-298

Protein Motifs-RNA Binding Domains