1 / 34

What can (many) sequences tell us?

What can (many) sequences tell us?. Nuclear receptor function. NR2A2-HN4G. NR2B3-RRXG. NR2A5-HN4 d?. NR2B1-RRXA. NR2B2-RRXB. NR3C1-GCR. NR2A1-HNF4. NR3C4-ANDR. NR3A1-ESTR. NR2C2-TR4. NR3C3-PRGR. NR2C1-TR2-11. NR0B1-DAX1. NR2E1-TLX. NR3A2-ERBT. NR0B2-SHP. NR3C2-MCR. NR2E3-PNR.

frachel
Télécharger la présentation

What can (many) sequences tell us?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What can (many) sequences tell us?

  2. Nuclear receptor function

  3. NR2A2-HN4G NR2B3-RRXG NR2A5-HN4 d? NR2B1-RRXA NR2B2-RRXB NR3C1-GCR NR2A1-HNF4 NR3C4-ANDR NR3A1-ESTR NR2C2-TR4 NR3C3-PRGR NR2C1-TR2-11 NR0B1-DAX1 NR2E1-TLX NR3A2-ERBT NR0B2-SHP NR3C2-MCR NR2E3-PNR NR3B1-ERR1 NR6A1-GCNF NR3B2-ERR2 NR2F6-EAR2 NR5A1-SF1 NR2F2-ARP1 NR5A2-FTF NR2F1-COTF NR4A1-NGFI NR4A3-NOR1 NR1C1-PPAR NR4A2-NOT NR1C2-PPAS NR1H4-FAR NR1C3-PPAT NR1H3-LXR NR1D1-EAR1 NR1D2-BD73 NR1I1-VDR NR1F3-RORG NR1H2-NER NR1A2-THB1 NR1F1-ROR1 NR1I2-PXR NR1A1-THA1 NR1F2-RORB NR1B3-RRG1 NR1B1-RRA1 NR1B2-RRB2 NR1I4-CAR1-MOUSE- NR1I3-MB67 Nuclear receptor family

  4. A-B C D E F Nuclear receptor structure AF-1 DNA LBD C • DNA binding domain • highly conserved • > 90% similarity E • Ligand binding domain • conserved protein fold • > 20% sequence similarity

  5. The questions How do ligands relate to activity? What is the role of each amino acid in the NR LBD? Which data handling / bioinformatics is needed to answer these questions?

  6. 3D structure LBD (hER)

  7. Available NR data 56 structures in (PDB) (>200 now*) >500 sequences (scattered) (>1500 now) >1000 mutations (very scattered) >10000 ligand-binding studies (secret) Disease patterns, expression, >1000 SNPs, genetic localization, etc., etc., etc. This data must be integrated, sorted, combined, validated, understood, and used to answer our questions. Now was in 2007…

  8. Step 1 The first important step is a common numbering scheme because all structures have different numbering schemes, and there are insertions and deletions between species that are confusing any numbering. Whoever solves that problem once and for all should get three Nobel prices.

  9. Large data volumes Large data volumes allow us to develop new data analysis techniques. Entropy-variability analysis is a novel technique to look at very large multiple sequence alignments. Entropy-variability analysis requires ‘better’ alignments than routinely are obtained with ‘standard’ multiple sequence alignment programs.

  10. Part of the big alignment We see correlations between columns and between ‘things’.

  11. Vriend’s first rule of sequence analysis If it is conserved, it is important

  12. Vriend’s second rule of sequence analysis If it is very conserved, it is very important

  13. Consequence: If something is conserved in each sub-family, it is involved in a sub-family specific function.

  14. What is CMA? Functions never is just one residue QWERTYASDFGRGH QWERTYASDTHRPM QWERTNMKDFGRKC QWERTNMKDTHRVW Red = conserved Green = variable Blue = correlated Example: (chymo)trypsin

  15. Correlations Residues can correlate with residues, and when that happens we found a function, no matter the conservation or variability. Residues that have a function, correlate with that function.

  16. Correlations with wavelength Residues can also correlate with something else. Example: optimal wavelength for opsin excitation. Wavelength Loop1 Loop2 UV Gln His Blue Asn Gln Red/Green Leu Gln

  17. Wilma Correlations with drug binding(so no longer evolution-based…) Wilma Kuipers Thesis

  18. Correlation analysis • Correlate sequences with ligand binding affinities • Alignments showed 100% correlation of affinity for pindolol and the absence/presence of Asn386 • Obviously, Asn386 plays an important role in ligand binding

  19. Wilma Wilma Kuipers Thesis

  20. Wilma Summary correlation If its conserved its important; if its important it remains conserved. If residue positions show correlation with ‘something’ it is involved in that ‘something’. ‘Something’ can be any of a very large number of functions. Wilma Kuipers Thesis

  21. Wilma Example correlation: Which cysteines form a pair in this protein family? Shown are aligned peptides from five different bacteria. ASDFGCHIKLMCNPQRSCTVW YSDYGCNIKLFCQPQRSCT-- ATDYPVQIKLMCNPQKSCSMW YTDFGCHVKLLVQPNRSVTVW -TDFGVHVKLMCNPQKSCSFW Wilma Kuipers Thesis

  22. Wilma Conserved or very conserved? Recalcitrant. ASDFGCHIKLMCNPQRSCTVW YSDYGCNIKLFCNPQRSCT-- ATDLPVQIKLMANPQKSCSVW LSDFGCHIKLMCNPQRSCTVW YTDFGCHVKLLVQPNRSVAFW -SDAGVHVKLMVQPNKSVSF- YTDFGCHVKLLVQPNRSVVFW -TDSGVHVKLMIQPNKSVSFW

  23. Conclusion from recalcitrance The more exceptions you find in other (homologous) families, the less important is the residue in your family.

  24. Entropy and variability So far we saw that conservation and correlation can help us find functionally important residues. Can variability patterns also tell us something?

  25. Entropy Sequence entropy Ei at position i is calculated from the frequency pi of the twenty amino acid types (p) at position i: 20 Ei =Spi ln(pi) i=1

  26. Variability Sequence variability Vi is the number of amino acid types observed at position i in more than 0.5% of all sequences.

  27. Intermezzo It is a common concept in bioinformatics to create an hypothesis. But……, every hypothesis must be tested against real data from real experiments.

  28. Ras Entropy-Variability 11 Red 12 Orange 22 Yellow 23 Green 33 Blue

  29. GPCR Entropy-Variability; signalling path GPCR 11 G protein 12 Support 22 Signaling 23 Ligand in 33 Ligand out

  30. NR LBD Entropy-Variability 11 main function 12 first shell around main function 22 core residues (signal transduction) 23 modulator 33 mainly surface 33 23 22 12 11

  31. Example: role of Asp 351 EV ánd correlation. But the correlation would never have been found from sequence analyses. agonist antagonist

  32. Summary variability analysis Variability patterns hold information. Entropy and Variability are two (of the) ways to measure variability patterns. Entropy and Variability patterns can say something about the type of function, and thus add detail to correlation studies.

  33. Conclusions: Data is difficult, but we need it (sic); life would be so nice if we could do without it. PDB files are the worst. Nomenclature is not homogeneous. Ontologies…. Much data has been carefully hidden in the literature, where it can only be found back with great difficulty. Residue numbering is difficult but very necessary. Variability-entropy analysis is powerful, but requires very 'good' alignments.

  34. A short break for a word from our sponsors Laerte Oliveira Adje Margot F L O R E N C E H O R N Our industrial sponsor: Wilma Kuipers Weesp Bob Bywater Copenhagen Nora vd Wenden The Hague Mike Singer New Haven Ad IJzerman Leiden Margot Beukers Leiden Fabien Campagne New York Øyvind Edvardsen TromsØ Simon Folkertsma Frisia Henk-Jan Joosten Wageningen Joost van Durma Brussels David Lutje Hulsik Utrecht Tim Hulsen Goffert Manu Bettler Lyon David Tim Elmar Krieger Fabien Manu Simon Folkertsma

More Related