CS177 Lecture 10SNPs and Human Genetic Variation Tom Madej 11.21.05
Lecture overview • Human genetic variation, HapMap project. • Experimental methods: PCR, X-ray crystallography, microarrays.
Motivations to study human genetic variation • The evolution of our species and its history. • Understand the genetics of diseases, esp. the more common complex ones such as diabetes, cancer, cardiovascular, and neurodegenerative. • To allow pharmaceutical treatments to be tailored to individuals (adverse reactions based on genetics).
Genetic variation • The human genome has approximately 10 million polymorphisms, i.e. genetic variants that occur at the level of about 1% or more in the population. • Many of these polymorphisms are SNPs, single nucleotide polymorphisms. • These polymorphisms contribute to our individuality, and also influence our susceptibility to various diseases.
Mendelian and non-Mendelian diseases • Geneticists have been very successful in discovering the variations due to Mendelian disorders. These are characterized by in that they follow the Mendelian rules of inheritance. • The study of particular families using linkage analysis has been successful for the Mendelian diseases. • However, the more common complex (i.e. non-Mendelian) disorders have been much more difficult to investigate, even there there are clearly genetic components to many of these diseases.
Sources of genetic variation (during meiosis) • Chromosomal reassortment; a human has 23 pairs of chromosomes, one of each pair is inherited from the father, and the other one from the mother. • Mutation; errors in DNA copying. This may result in SNPs or also larger portions of DNA may be duplicated or copied incorrectly. • Genetic recombination; shuffling of segments between partner chromosomes of a pair.
Reassortment of genetic material during meiosis Molecular Biology of the Cell, Alberts et al. Garland Publishing 2002 (Fig. 20-8)
Single Nucleotide Polymorphisms (SNPs) • Major source of genetic variation. • Estimated approx. 7 million SNPs that occur with frequencies at least 5% in the human population; approx. 11 million with frequencies at least 1%. • Can we determine the associations between these variants and diseases?
International HapMap project • Haplotype – set of variants on a chromosome that tend to inherited as a block. • Provide a collection of SNPs spanning the genome, and serving as genetic markers. • Study correlations (linkage disequilibrium, LD) between the SNPs. • Provide a guide for whole genome association studies.
HapMap project • Project was launched in Oct 2002. • In the first phase genotyped 1.1 million SNPs in 269 individuals from four ethnic origins. • Second phase will genotype another 4.6 million SNPs. • Goal was to find most SNPs that occur with frequencies of at least 5% in the human population.
Statistics digression: here is an example of a commonly used correlation measure…
Correlated (LD) SNPs and tag SNPs Nature Genetics: published online Oct 30, 2005; doi:10.1038/ng1688
Haplotype diversity Nature, v. 437 Oct 27, 2005, p.1306
LD summary • The human genome consists of regions of low polymorphism (i.e. low sequence variation) of sizes from 10-100 kb, interspersed with regions of high polymorphism. • This seems to be due to “recombination hotspots” in the chromosomes. • The inheritance of chromosomal regions without recombination (haplotypes) means that certain combinations of genes are widespread across the human population.
Exercise! • Go to www.hapmap.org, and select “Browse Project Data” (link on the left). • In the “Landmark or Region” box enter: DTNBP1, then click “Search”. • Select the NM_032122 link (isoform a). • Take a look at the Overview and Details. • Go down to “Tracks”, select “Analysis All on”, and then “Update Image”. • Take a look at the LD map, phased haplotypes, and list of tag SNPs.
Whole genome association study • Given a sample of people, some with and some without a certain trait/phenotype (e.g. a certain disease). • Call the two sets cases and controls. • Investigate the genetic factors shared by the cases, but absent from the controls; i.e. find the associations between the genetic factors and the disease. • The most straightforward way: genotype all the individuals. • But this is far too expensive with current technology!
The HapMap data is useful for whole genome association studies… • The collection of SNPs give us common genetic markers. • By using tag SNPs we can reduce the number of SNPs that need to be genotyped in the study. • It is even possible to produce SNP chips with a few hundred thousand tag SNPs that can be used for the genotyping. • But statistical studies need to be done!