1 / 46

Сравнительная геномика Полиморфизм генома человека ФББ, 4 курс

Василий Евгеньевич Раменский, Институт молекулярной биологии РАН. Сравнительная геномика Полиморфизм генома человека ФББ, 4 курс. People are different….

Télécharger la présentation

Сравнительная геномика Полиморфизм генома человека ФББ, 4 курс

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Василий Евгеньевич Раменский, Институт молекулярной биологии РАН Сравнительная геномикаПолиморфизм генома человекаФББ, 4 курс

  2. People are different…

  3. …caccagctcctgtgGggggaggccctgct… …caccagctcctgtgGggggaggccctgct… …caccagctcctgtgGggggaggccctgct… …caccagctcctgtgCggggaggccctgct… …caccagctcctgtgCggggaggccctgct… …and so are their genomes

  4. 5’---------------A---------------3’ ||||||||||||||||||||||||||||||| 3’---------------T---------------5’ 5’---------------G---------------3’ ||||||||||||||||||||||||||||||| 3’---------------C---------------5’ Na Ng Определение SNP (single nucleotide polymorphism): существование в популяции на одной и той же позиции геномной ДНК двух нуклеотидных вариантов с частотой более редкого варианта (аллеля) ≥1% Na+Ng = N, Na/N ≥0.01, Ng/N ≥0.01

  5. Комментарии к определению • речь идет о сравнении последовательностей одного биол. вида • слово «полиморфизм» не имеет в русском языке множественного числа (Н.Ляпунова, личное сообщение) • в обыденной речи под «полиморфизмом» чаще всего подразумевают именно нуклеотид (т.е. используют его как синоним слова «мутация») • определение подразумевает достоверное измерение частот в популяции(-ях), что в текущей практике пока редкость

  6. Типы полиморфизма в геноме * однонуклеотидный (SNP) * короткая вставка/делеция * микросателлитный повтор различной длины (VNTR, variable number tandem repeat) * вставка объекта * множественный нуклеотидный (MNP)

  7. Некоторые свойства SNPs • Comprise the ~90% of human genetic variation • Occur with an average density ~1/1000bp • Transition C↔T(G↔A) occurs at ~2/3 of all cases, three transversions C↔A (G↔T), C↔G(G↔C), T↔A(A↔T) in ~1/6 of all cases each • Most of them (~85%) are common to all populations (with differing allele frequencies)

  8. Why SNPs are important? • Convenient genetic markers • Responsible for existence of various phenotypes, with primary interest in disease ones • Pharmacogenomics: individual response to drugs • Clues to understand human evolution

  9. SNP в геноме человека

  10. dbSNP build statistics Build Date # rs’s, x106 10? Feb. 01. . . . . . . . . .1.42 106 Aug.02. . . . . . . . . .2.81 110 Jan. 03. . . . . . . . . . 3.05 119 Jan. 04. . . . . . . . . . 7.23 124 Jan. 05 . . . . . . . . . .10.0

  11. Estimates of SNP density in the human genome • Li and Sadler (1991), Genetics, ~1/1000 bp • Zhao et al., (2003), Gene: ~1/1200 bp • dbSNP, build 124 (2005): ~1/300 bp (?)

  12. Классификация SNP по положению в геноме 1. гены 1.1 UTR 1.2 экзоны (cSNP) 1.2.1 синонимичные(sSNP) 1.2.2 несинонимичные (nsSNP) 1.3 интроны 1.4 сайты сплайсинга 2. регуляторные участки генов (rSNP) 3. межгенные участки

  13. … H Q L LW G E A L … … H Q L LC G E A L … Synonymous vs. non-synonymous SNPs: Example: Lysosomal alpha-glucosidase precursor (SwissProt P10253) Hypothetical SNP: C  T HGVBase ID: SNP000003023 G  C …CAC CAG CTC CTG TGG GGG GAG GCC CTG CT… …CAC CAG CTC CTG TGC GGG GAG GCT CTG CT… nsSNPTrp746Cys sSNPAla749Ala

  14. Упражнение В одной базе ~11,000 nsSNPs в ~6,000белков. В другой базе ~47,000последовательностей белков общей длиной ~19.5x106остатков. Оценить (а) среднюю длину белка (б) среднее число nsSNP в одном белке (в) среднее число nsSNP на единицу длины белка

  15. Жизненный цикл SNP (по Miller&Kwok, 2001) • Появление нового аллельного варианта путем мутации (~100 мутаций на индивидуум) • «Выживание» до момента появления гомозигот по этому аллелю • Медленное увеличение частоты в популяции • Фиксация нового аллеля (0 vs. 100%), превращение в between-species difference

  16. Упражнение Описанный выше жизненный цикл SNP занимает ~0.3млн лет. Предполагая, что разделение человека и шимпанзе произошло ~5 млн лет назад, а выход H.sapiens из Африки и разделение различных популяций ~0.1-0.2 млн лет назад, аргументировать возможность существования (а) одинаковых SNPs у человека и других видов, (б) «private» SNP,т.е. локализованных в пределах одной человеческой популяции

  17. Why polymorphisms are maintained in the population? • Selectionists: because heterozygotes have higher fitness • Neutralists: because all observed polymoprhisms are selectively neutral - - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - Reality: is always somewhat more complicated

  18. Why SNPs are important? • Convenient genetic markers • Responsible for existence of various phenotypes, with primary interest in disease ones • Pharmacogenomics: individual response to drugs • Clues to understand human evolution

  19. nsSNPs vs. disease mutations • Disease mutations are rare (<<1%) and usually cause monogenic diseases (e.g., cystic fibrosis) • nsSNPs are frequent (>1%) and can modify risks of major common (multigenic, complex) diseases (e.g., cancer, cardiovascular disease, mental illness, autoimmune states, diabetes) In some cases, however, it is difficult to make adistinction

  20. Some common nsSNPs are known to affect critical structure features Frequency of the haemochromatosis allelic variant of HLA-H protein Cys260Tyr (with destroyed disulphide bond) is up to 6% in Northern Europe

  21. Identifying SNPs responsible for specific phenotypes • whole genome scan – hypothesis free approach; extraordinary number of candidate SNPs • candidate gene studies – requires a priori models; nevertheless, large numbers of candidate SNPs to be tested Both methods, however, require huge amounts of expensive experimental data and are are statistically unreliable. Therefore, in silico expertise is required

  22. Methods for prediction of effect of nsSNPs * Sequence-based methods: analysis of multiple alignment with homologs Ng-Henikoff [2002] * Structure-based methods: analysis of various structural parameters Wang, Moult [2001]; Chasman, Adams [2001] * Combined methods: sequence and structure analysis Sunyaev,Ramensky,Bork [2000, 2001, 2002]

  23. PolyPhen: prediction of amino acid substitution effect on protein function • Data sources: • Sequence annotation of the query protein • PSIC profile matrix values derived from multiple alignment with homologous proteins • Structural parameters and contacts of query protein structure or its >50% homolog • Prediction: benign (neutral), damaging (deleterious)

  24. sequence annotation prediction rules • INPUT: • Sequence: …IMAGLQQTNSE… • Position: 133 • Var1: Q • Var2: P • ACC/ID (if known protein): DMD_HUMAN • PREDICTION: • damaging • benign • unknown PSIC profile scores for two amino acid variants structural parameters and contacts PolyPhen query processing flowchart

  25. I. Sequence annotation Hereditary hemochromatosis protein precursor (HLA-H, Q30201) Features checked: * bond: DISULFID, THIOLEST, THIOETH * site: BINDING, ACT_SITE, LIPID, METAL, SITE, MOD_RES,SE_CYS * region: TRANSMEM, SIGNAL, PROPEP

  26. II. PSIC: profile analysis of homologous sequences • Align with homologous proteins with seq. ide. 30..94%

  27. SAsn,4 SCys,4 II. PSIC: profile analysis of homologous sequences 2. Calculate the profile matrix with PSIC algorithm Profile matrix:Sa,j = ln[ pa,j / qa ], a = {1,..20}, j = {1,..N}, N = alignment length

  28. SAsn,4 SCys,4 II. PSIC: profile analysis of homologous sequences 3. Analyse difference between profile scores for two a.a. variants: AsnCys:  = | SAsn,4 –SCys,4 | = 1.591

  29. III. 3D structure analysis 1. Residues that are in spatial contact with a ligand or other “critical” residues Zen 999 Bos Taurus trypsin [PDB ID :1ql7] residues in 5Å contact with Zen 999

  30. III. 3D structure analysis 2. Residues that form the hydrophobic core of the protein (buried residues) Surface residues Buried residues Bos Taurus trypsin [PDB ID :1ql7]

  31. Structural parameters and contacts • Secondary structure • Phi-psi dihedral angles • Solvent accessible surface area, normed s.a.s.a • Change in accessible surface propensity • Change in residue side chain volume • Contacts with heteroatoms • Interchain contacts • Contacts with functional sites (BINDING, ACT_SITE, LIPID, and METAL) • Region of the phi-psi map (Ramachandran map) • Normalised B-factor (temperature factor)

  32. RULES (connected with logical AND) PREDICTION PSIC score difference : Substitution site properties: Substitution type properties: arbitrary annotated as a functional* or bond formation** site arbitrary probably damaging not considered in a region annotated or predicted as transmembrane PHAT matrix difference resulting from substitution is negative possibly damaging 0.5 arbitrary arbitrary benign >1.0 atoms are closer than 3.0Å to atoms of a ligand or residue annotated as BINDING, ACT_SITE, LIPID, METAL arbitrary probably damaging 0.5<1.5 normed accessibility ACC15% absolute change of accessible surface propensity is 0.75 or absolute change of side chain volume is 60 possibly damaging normed accessibility ACC5% absolute change of accessible surface propensity is 1.0 or absolute change of side chain volume is 80 probably damaging 1.5<2.0 arbitrary arbitrary possibly damaging >2.0 arbitrary arbitrary probably damaging

  33. Control sets all dam unknown dam/(dam+ben) ––––––––––––––––––––––––––––––––––––––––––––– Disease mutations Strict set 444 366 3 82.9% Total 2,782 2,047 70 75.4% Between species substitutions Total 671 58 5 8.7%

  34. PolyPhen: predictions for nsSNPs AllSNPs from HGVBase, rel.12.............................983,589 synonymous...................................9,310 (5,378 proteins) non-synonymous..............................11,152 (6,124 proteins) Predictions for nsSNPs: unknown................................................1,987 benign.................................................6,317 possibly damaging......................................1,591 probably damaging......................................1,257 Prediction basis: multiple alignment...................................2,654 sequence annotation....................................118 structure...............................................76

  35. PolyPhen predictions for dbSNP b.121 [ Ivan Adzhubei, 2004 ] All: 9,502 unknown 27,991 benign...............67.6% 7,905 possibly damaging....19.1% 5,521 probably damaging....13.3% 50,919 total (44,005 unique rs’s) With structure: 42 unknown 2,142 benign...............57.1% 531 possibly damaging....14.2% 1,076 probably damaging....28.7% 3,791 total (,167 uniqe rs’s)

  36. PolyPhen predictions for dbSNP b.121 [ Ivan Adzhubei, 2004 ] All: Filtered: 5 seq. in multiple alignment 16,813 benign...............64.2% 5,195 possibly damaging....19.8% 4,168 probably damaging....15.9% 26,176 total (21,677 unique rs’s) With structure: Filtered: 5 seq. in multiple alignment 2,021 benign...............56.6% 499 possibly damaging....14.0% 1,050 probably damaging....29.4% 3,570 total (2,983 unique rs’s)

  37. Hydrophobic core stability parameters are the best predictors Ramensky et al., Nucleic Acids Res. (2002) 30:3894-90

  38. PolyPhen http://www.bork.embl.de/PolyPhen • PolyPhen input : • Protein identifier OR sequence • Substitution position • Substitution type

  39. PolyPhen http://www.bork.embl.de/PolyPhen

  40. PolyPhen:nsSNPs data collection

  41. Transphyretin (PDB: 1tyr, SNP000012365) Thr118  Asnoccurs at the ligand (REA)binding site Thr 118 REA 130 DAMAGING nsSNPs

  42. Trypsin (PDB: 1trn, SNP000012965) Ser142Pheresults in the strong side chain volume change at a buried position Ser 142 DAMAGING nsSNPs

  43. PolyPhen: дитя семи нянек ЦИКЛОП ПОЛИФЕМ ПРЕДСТАВЛЯЛ СОБОЙ УНИКАЛЬНЫЙ ПОДВИД КАРЛИКОВЫХ СЛОНОВ Известия-Наука, 18 ноября 2003 Вонзая заостренное бревно в единственный глаз свирепого циклопа Полифема, легендарный Одиссей истреблял уникальный вид карликовых слонов, обитавших на острове Сицилия. Древний миф об одноглазых человекообразных исполинах развеяли итальянские палеонтологи на научной экспозиции "Полифем в Модене". На выставке представлены черепа, обнаруженные исследователями на Сицилии, у которых одна фронтальная глазница. С первого взгляда она очень напоминает глаз во лбу. Найденные рядом с черепами кости действительно принадлежат немаленькому млекопитающему, которое имело габариты крупного медведя. Обладатель этих останков был не циклопом, а карликовым слоном. "Глаз" во лбу - отверстие для дыхательных путей, то есть для хобота.

  44. Polyphenism: the ability of a single genome to produce two or more alternative morphologies within a single population in response to an environmental cue (such as temperature, photoperiod, or nutrition). [Dr. Ehab Abouheif, McGill University,Montréal Québec] The seasonal morphs of the buckeye butterfly, Precis coenia (Nymphalidae). The ventral surfaces are shown. The Summer morph ("linea") is on the left; the Fall morph ("rosa") is on the right. [Scott F.Gilbert, A Companion to Developmental Biology. Chapter 22, Seasonal Polyphenism in Butterfly Wings]

  45. Damaging nsSNPs • We estimate that ~20% of non-synonymous cSNPs from databases are damaging • Average allele frequency of non-synonymous cSNPs predicted to be damaging is twice lower than for benign non-synonymous cSNPs • We propose to use these predictions for prioritisation of candidates for association studies

More Related