  1. Gene Variationssingle nucleotides polymorphism&copy number variation 刘戈飞 汕头大学医学院遗传学与细胞生物学教研室 075488900497 13502932022

  2. Genetic Variations Chromosome numbers Segmental duplications, Copy number variation Translocations Inversion Sequence Repeats Transposable Elements Short deletions and insertions Tandem Repeats Nucleotide Insertions and Deletions (Indels) Single Nucleotide Polymorphisms (SNPs) Mutations Sizable Structural Sequence Minor

  3. Genetic Markers • Morphological markers • Cytological markers • Biochemical and physiological markers • Molecular markers • 1980, RFLPs (restriction fragment length polymorphisms) • 1985, STRs (short tandem repeats, mini-satellites) • 1990s, SNPs (single nucleotide polymorphisms) • 2000s, CNV (copy number variation)

  4. SNP A C G T G T C G G T C T T A A AMaternal chromosome A C G T G T C C G T C T T A A APaternal chromosome Individual 1 A C G T G T C G G T C T T A A AMaternal chromosome A C G T G T C G G T C T T A A APaternal chromosome Individual 2 A C G T G T C C G T C T T A A AMaternal chromosome A C G T G T C C T A C T T A A APaternal chromosome Individual 3 The position of the SNP is indicated by the box. Individual 1 is heterozygous, while individuals 2 and 3 are homozygous. Single nucleotide polymorphism (SNP) 在基因组中,不同个体的DNA序列上的单个碱基的差异被称作单核苷酸多态性。

  5. Single nucleotide polymorphism (SNP) 1/1000 Estimated between any 2 individuals (3 m) 10 m in the whole populations

  6. 1 2 3 4 1 2 3 1 2 4 SNP Effects • SNPs in genes In coding regions (possible protein structure changes)  Synonymous substitutions (同义)  Missense substitutions (错义)  Nonsense substitutions (终止) In coding and non-coding regions  Change of gene expression (by diverse binding various factors) Yield Timing Alternative splicing • SNPs in regulatory regions • Change of gene expression • SNPs in non-regulatory intergenic regions • Can be used as genetic markers

  7. HapMap 国际人类基因组单体型图计划 Towards genome variations

  8. 人类的所有群体中大约存在一千万个SNP位点,其中稀有的SNP位点的频率至少有1%。人类的所有群体中大约存在一千万个SNP位点,其中稀有的SNP位点的频率至少有1%。 • 相邻SNPs的等位位点倾向于以一个整体遗传给后代。位于染色体上某一区域的一组相关联的SNP等位位点被称作单体型(haplotype)。

  9. 大多数染色体区域只有少数几个常见的单体型(每个具有至少5%的频率),它们代表了一个群体中人与人之间的大部分多态性。一个染色体区域可以有很多SNP位点,但是只用少数几个标签SNPs,就能够提供该区域内大多数的遗传多态模式。大多数染色体区域只有少数几个常见的单体型(每个具有至少5%的频率),它们代表了一个群体中人与人之间的大部分多态性。一个染色体区域可以有很多SNP位点,但是只用少数几个标签SNPs,就能够提供该区域内大多数的遗传多态模式。

  10. HapMap的构建分为三个步骤:a在多个个体的DNA样品中鉴定单核苷酸多态性SNPs;b将群体中频率大于1%的那些共同遗传的相邻SNPs组合成单体型;c在单体型中找出用于识别这些单体型的标签SNPs。通过对图中的三个标签SNPs进行基因分型,研究者可以确定每个个体拥有图示的四个单体型中的哪一个。

  11. Human genome is composed of “blocks” We are so young! with limited number of ancestors with a few (thousands) of generations with only a few recombination events 我们非常年轻 人类进化史上曾有一大瓶颈(约6-15万年前) 通过“瓶颈”的人类祖先群体很小(仅有万余人) 现代人类仅经过少数几千个时代(约3000-5600代) “遗传重组”数目有限

  12. 单体型的起源

  13. Methods and technologies in SNP studies • Discovery (Find SNPs) • Validation (A common one or rare one) • Genotyping (Frequency in population) • Consideration: • Call rates • Flexibility • Throughput • Cost

  14. Fundamental approaches • large-scale sequencing based: • genomic-alignment(GA), • reduced representation shotgun(RRS) • PCR based: common PCR • hybridization based: DNA chips

  15. How to discover SNPs Genomic DNA mRNA RRS (reduced representation shot-gun) library or sampling BAC library cDNA library BAC overlap Shotgun overlap EST overlap Sequence overlap SNP discovery GTTTAAATAATACTGATCA GTTTAAATAATACTGATCA GTTTAAATAGTACTGATCA GTTTAAATAGTACTGATCA

  16. Discovering SNPs by Sequencing Phred Phrap Sequence Amplify DNA Base-calling Contig assembly 5’ 3’ Quality determination PolyPhred Polymorphism detection ATAGACGATACACG ATAGACGATACACG ATAGACG ATACACG Consed Sequence viewing Polymorphism tagging Analysis Homozygotes Heterozygote Polymorphism reporting Individual genotyping Phylogenetic analysis

  17. SNP检定— Genotyping 目标:灵敏、准确、简单、高通量、低成本 Throughput SNaPshot(ABI)、SNuPe(GE)、TaqMan(ABI)、Pyrosequencing Fluorescence Polarization(PE)、MassArray(Sequenom) Invader(Third Wave)、SNPlex(ABI)、 Parallele、BeadArray(Illumina)

  18. Genes, Samples, Phenotypes Primers design and PCR -1000~-1 regulation region 5’UTR exons 3’UTR Directly DNA sequencing Statistical Analysis SNP screening of certain genes

  19. SNP raise the resolution of genetic analysis • Pharmacogenomics • Personalize medicine

  20. 2|JANUARY 2007|VOLUME

  21. Science,2004 23 JULY 2004,305:525

  22. Forty-three authors used the DNA from 270 individuals from the 4 HapMap populations.

  23. Overall, the authors found 1,447 discrete, heterogeneously distributed, copy number variable regions (CNVRs), which cover 12% of the human genome. They found that 24% of CNVRs are associated with segmental duplications.

  24. CNVRs contain different classes of functional elements. • many CNVs preferentially lie outside genes. • genes that are involved in cell-adhesion functions, sensory perception of smell and response to chemical stimuli are enriched within CNVs. • Conversely, cell signalling and proliferation, as well as kinase-and phosphorylationrelated categories were underrepresented among CNVs. • Interestingly, ultraconserved elements are strongly excluded from these regions.

  25. CNV has effects on SNP genotype patterns. SNP has the ability to identify linked CNV. • Both types of variation will need to be collected and analysed systematically if we are to understand the genetic basis of human disease.

  26. The authors call for standard assessment of CNV in all studies of the genetic basis of phenotypic variation, and for an international effort to continue to characterize and catalogue structural genomic variation.

  27. 26,628 clones 534500 SNPs

  28. Effects of CNV • phenotype: modify drug response • predispose to or cause disease • polymorphism: population genetics • genome wide gene regulation variation

  29. Methods to identify CNV • Genome-wide • array-based • array- CGH: Clone-based(1Mb), oligonucleotide-based(30kb) • SNP array (signal intensity, genotyping) • sequence-assembly comparison • Targeted • PCR-based • MAPH, MLPA, QMPSF: mutiplex, up to 40 regions per time • real-time qPCR • Hybridization-based • FISH, Southern blotting • Computation approaches

  30. Methods to identify CNV: array-CGH array-based CGH

  31. Methods to identify CNV: array-CGH representational oligonucleotide microarray analysis, ROMA

  32. Methods to identify CNV: targeted PCR-based multiplex amplifiable probe hybridization, MAPH

  33. Methods to identify CNV: targeted PCR-based Multiplex ligation-dependent probe amplification, MLPA

  34. Methods to identify CNV: targeted PCR-based Quantitative multiplex PCR of short fluorescent fragments, QMPSF

  35. Methods to identify CNV: computational

  36. submicrosopic microscopic Validation of CNV • Mass spectrometery: MALDI-TOF • real-time quantitative PCR • Southern blotting • FISH

  37. MassArray (1) DNA 提取 目标序列的扩增 第一次纯化和SNP位点延伸反应 Primer 点样 MALDI-TOF 质谱测定 自动序列分析及等位基因信息的获得

  38. Allele 1 Allele 2 Unlabeled Primer (23mer) Unlabeled Primer(23mer) T C T A C T T G Extended Primer (24mer) Extended Primer (26-mer) A A T C T A C T Allele 2 Allele 1 Allele 2 Allele 1 MassArray (2) +Enzyme +ddATP +dCTP/dGTP/dTTP

  39. Identify CNV of certain genes or regions Southern (small samples) FISH (optional) real-time qPCR QMPSF MAPH MLPA (large samples)