630 likes | 804 Vues
SNP Resources: Finding SNPs, Databases and Data Extraction. Debbie Nickerson debnick@u.washington.edu SeattleSNPs. Complex inheritance/disease. Many Other Genes. Variant Gene. Environment. Disease. Diabetes Heart Disease Schizophrenia Obesity Multiple Sclerosis Celiac Disease
E N D
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson debnick@u.washington.edu SeattleSNPs
Complex inheritance/disease Many Other Genes Variant Gene Environment Disease Diabetes Heart Disease Schizophrenia Obesity Multiple Sclerosis Celiac Disease Cancer Asthma Autism Two hypotheses: 1- common disease/common variant? 2- common disease/many rare variants?
duplications deletions Genomic Variation inversions insertions Human Genetic Variation Copy-Number Variants Single Nucleotide Polymorphisms Small indels structural variation Frequency • Gene-rich, eg immune response, drug metabolism • Abundant cytogenetic 1 bp 1 chr Size
Total sequence variation in humans Population size: 6x109 (diploid) Mutation rate: 2x10–8 per bp per generation Expected “hits”: 240 for each bp Every variant compatible with life exists in the population BUT: Most are vanishingly rare Compare 2 haploid genomes: 1 SNP per 1331 bp* *The International SNP Map Working Group, Nature409:928 - 933 (2001)
Building Maps of Single Nucleotide Polymorphisms(SNPs)ATTCGGCATGAAATTCGGGATGAA Developed in two overlapping phases: SNP Discovery SNP Genotyping
mRNA cDNA Library BAC Library EST Overlap BAC Overlap Validated - 5.6 MILLON SNPS G C Finding SNPs: Sequence-based SNP Mining Genomic RRS Library Random Shotgun DNA SEQUENCING Shotgun Overlap Align to Reference RANDOM Sequence Overlap - SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC > 11 Million SNPs
1.0 Candidate Gene Sequencing 96 48 24 16 HapMap Based on ~ 6-8 Chromosomes random 8 8 0.5 2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Minor Allele Frequency (MAF) Increasing Sample Size Improves SNP Discovery { GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC 2 chromosomes Fraction of SNPs Discovered New 1000 Genome Program
Genotype - Phenotype Studies You have candidate gene/region/pathway of interest and samples ready to study: What SNPs are available? How do I find the common SNPs? What is the validation/quality of the SNPs? Are these SNPs informative in my population/samples? What can I download information? How do I pick the “best” SNPs? - Dana Crawford
Minimal SNP information for genotyping/characterization • What is the SNP? Flanking sequence and alleles. • FASTA format • >snp_name • ACCGAGTAGCCAG • [A/G] • ACTGGGATAGAAC • dbSNP reference SNP # (rs #) • Where is the SNP mapped? Exon, promoter, UTR, etc • How was it discovered? Method • What assurances do you have that it is real? Validated how? • What population – African, European, etc? • What is the allele frequency of each SNP? Common (>5%), rare • Are other SNPs associated - redundant? • Is genotyping data for control populations available?
Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? • 1. SeattleSNPs - Candidate gene website • 2. Other web applications • GVS • HapMap Genome Browser • 3. Entrez Gene • - dbSNP • - Entrez SNP
Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? • 1. SeattleSNPs - Candidate gene website • 2. Other web applications • GVS • HapMap Genome Browser • 3. Entrez Gene • - dbSNP • - Entrez SNP
Finding SNPs: Seattle SNPs Candidate Genes pga.gs.washington.edu
Finding SNPs: SeattleSNPs Candidate Genes Example - PCSK9
AD ED
SNP_pos <tab> Ind_ID <tab> allele1 <tab> allele2 Repeat for all individuals Repeat for next SNP
PolyPhen - Polymorphism Phenotyping Structural protein characteristics and evolutionary comparison SIFT = Sorting Intolerant From Tolerant Evolutionary comparison of non-synonymous SNPs
Finding SNPs: SeattleSNPs Candidate Genes pga.gs.washington.edu
Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? • 1. SeattleSNPs - Candidate gene website • 2. Other web applications • GVS • HapMap Genome Browser • 3. Entrez Gene • - dbSNP • - Entrez SNP
GVS: Genome Variation Server http://gvs.gs.washington.edu/GVS/ • Provides rapid analysis of 4.5 million genotyped SNPs from dbSNP and the HapMap • Mapped to human genome build 36 (hg18) • Displays genotype data in text and image formats • Displays tagSNPs or clusters of informative SNPs in text and image formats • Displays linkage disequilibrium (LD) in text and image formats • Online tutorial provided at OpenHelix.com
GVS: Genome Variation Server LDLR http://gvs.gs.washington.edu/GVS/
GVS: Genome Variation Server • Table of genotypes • Image of visual genotypes
GVS: Genome Variation Server Genotypes displayed in prettybase table and visual genotype graphic
High Density Genic Coverage(SeattleSNPs) Low Density Genome Coverage (HapMap) = Seattle \SNP discovery (1/200 bp) =HapMap SNPs (~1/1000 bp) GVS: Genome Variation Server Dense genotypes around a candidate gene can be integrated with broader HapMap genotypes
GVS: Genome Variation Server Dense genotypes around a candidate gene can be integrated with lower-density HapMap genotypes
GVS: Genome Variation Server Common samples-combined variations B. Combined samples- common variations Combined samples- combined variations Common Combined
GVS: Genome Variation Server Common samples- combined variations -Common samples- Combined variations
GVS: Genome Variation Server B. Combined samples- common variations SeattleSNPs -Combined samples- HapMap
GVS: Genome Variation Server C. Combined samples- combined variations Combined variations -Combined samples-
Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? • 1. SeattleSNPs - Candidate gene website • 2. Other web applications • GVS • HapMap Genome Browser • 3. Entrez Gene • - dbSNP • - Entrez SNP
Finding SNPs: HapMap Browser • HapMap data sets are useful because individual genotype data in deeply sampled populations can be used to determine optimal genotyping strategies (tagSNPs) or perform population genetic analyses (linkage disequilbrium) • Data are specific to the HapMap project (not all dbSNP) • HapMap data is available in dbSNP • Visualization of data and direct access to SNP data, individual genotypes, and LD analysis possible in the browser and formats can be saved for Haploview
Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? • 1. SeattleSNPs - Candidate gene website • 2. Other web applications • GVS • HapMap Genome Browser • 3. Entrez Gene • - dbSNP • - Entrez SNP
NCBI - Database Resource PCSK9 www.ncbi.nlm.nih.gov
Finding SNPs using NCBI databases http://www.ncbi.nlm.nih.gov/
Default View cSNPs
Finding SNPs using NCBI databases http://www.ncbi.nlm.nih.gov/