MESA Family Genetics Committee

MESA Family Genetics Committee Steve Rich, PhD February 6th, 2006

Candidate Gene Genotyping Study • Conduct an association study of 2,880 MESA participants (parent study) • 720 randomly selected from each ethnic group of Caucasian-, African-, Mexican-, and Chinese-Americans • Include the well-phenotyped “MESA 1000” • Genotype 1536 SNPs in candidate genes proposed by MESA Investigators

Candidate Gene Selection • 267 candidate genes were initially proposed by MESA investigators, collated by DCC, and then ranked by MESA investigators • Further input by MESA laboratory and MESA Family Study Genetics Committee (294 genes) • List of 294 genes sent to Illumina, list of 60,293 SNPs returned with feasibility scores • List became final when SNP selection was complete

Selection of SNPs • SNPs proposed by MESA investigators • SNPs with feasibility of 80% or greater • SNPs with minor allele frequency >0.05 • tagSNPs for Caucasian-American and African-American haplotype blocks as defined by Tagger for each candidate gene • List of SNPs sent to Illumina for evaluation • Candidate gene was re-analyzed for tagSNPs if necessary after feasibility scores from Illumina

Caveats • To reduce the total number of SNPs for some genes, the MAF was increased to > 0.20, and/or the SNPS were limited to Caucasians • So not all “eggs in one basket” • Some genes are represented by a few SNPs that were proposed by investigators only • cSNPs tended to fail Illumina designability • This program will not look at APOE very well because the most important SNPs failed designability

Illumina Marker Panel Marker Set # SNPs Cardiovascular Candidate Genes 1,439 Ancestry Informative Markers (AIMs) 97 Mean SNPs/gene 11.3 Max SNPs/gene 49 (VWF) Min SNPs/gene 1 (VEGFB, MRPL10, KCNH2, FLJ3116, DMGDH, C4orf9) TOTAL SNPs ASSAYED BY 1,536 ILLUMINA

Data Overview • 1440/1536 SNPs were successfully genotyped • 119 genes represented • 97 ethnic-specific markers • Only 6 DNA samples did not genotype.

Summary of Genotyping Results Illumina reports: • Data quality was very high • DNA success rate unprecedented (for such a large project) • Being pleased with the locus conversion rate (higher than predicted) • Excellent DNA quality (aided in achieving the high locus success rate)

Illumina Marker Panel #2: Results of Genotyping #SNPs Gene/Marker Set Picked Succeed Fail APOE 7 2 5 ADORA2A 14 10 4 3 SNP FAIL (6 Genes) 3 2 SNP FAIL (17 Genes) 2 1 SNP FAIL (35 Genes) 1 TOTAL FAIL 96 # SUCCESSFUL SNPS 1440 (93.75%)

AIMS 97 SNPs Selected Genome-wide Selected to have provide maximal pairwise allele frequency differences Only 1 AIM failed 96 AIMs for Ancestry Analysis and Cross-reference with MESA self-reported Ethnicity CAU (CEPH) CHN (Han) AFR (YRI) Frequencies from HapMap Ancestral Populations http://www.hapmap.org

DNA Samples Sent to Illumina Unique Samples 2,880 Illumina non-Blinded Duplicates 2 per 32 plates = 64 Illumina Blinded Duplicates 4 (ethnicity) x 23 = 92 (plate 33) Total Duplicates 156 TOTAL SENT TO ILLUMINA 3,036 TOTAL DNA PLATES 33

Sample Failures @ Illumina 6 samples failed to type @ Illumina: 2 Samples* JHU AFA (Fem African Am) 1 Sample JHU EUA (Male Eur Am) 1 Sample* COL EUA (Fem Eur Am) 1 Sample* UMN EUA (Fem Eur Am) 1 Sample* UCLA CHN (Fem Chin Am) Sample Success Rate: 3030 / 3036 = 99.8% None of the samples were duplicated * 5 / 6 of the samples were discordant for MESA database sex versus genetically inferred sex; MESA database sex shown

The Illumina Data Set Samples SNPs Total Total Genotypes Attempted 3,036 1,536 4,663,296 *Specific Non-called Genotypes 184 Total Genotypes Returned 3,030 1,440 3,030 x 1440 – 184 = 4,360,016 * Not due to sample failure

Analysis of DuplicatesGenotyping Quality • Spike each of 32 (96 well) DNA plates plates with 2 duplicate samples = 64 in total • Illumina notified of duplicates (non-blinded) • Add an extra plate (#33) of 4 ethnicities x 23 duplicates = 92 duplicates • Illumina not notified that these were duplicates (blinded) • Total: 64 + 92 = 156 duplicate pairs of DNAs

Analysis of Duplicates: Genotyping Quality Results 156 x 1440 (SNP markers succeeded) = 224,640 total duplicate genotype pairs = 224,414 genotype pairs fully typed + 163 (missing genotype in one sample) + 63 (missing genotypes in both samples of a duplicate)

Analysis of DuplicatesIllumina Non-Blinded I64 x 1440 – 45 – 31 = 92,084 genotype pairs (less missing 1 or 2 genotypes) Remove and do not count: 1 pair of clearly non-identical samples, mismatch at 628 / 1439 SNPs total typed (lab error ?). One other single pair of genotypes do not match Non-blinded pair concordance rate = (92,084 -*1,439-1) /(92,084 – 1,439) = >99.998% *1,439 not 1440 – 1 missing SNP genotypes.

Analysis of DuplicatesIllumina Blinded 92 x 1440 – 138 – 32 = 132,310 genotype pairs (less missing 1 or 2 genotypes) Remove and do not count: 2 pairs of clearly non-identical samples: 717 / 1439 and 717 / 1438 SNPs mismatch (lab error?). 4 other genotype pair mismatches Blinded pair concordance rate = (132,310 -1438-1439-4) /(132,310 –1438-1439) = >99.996% *1438 not 1440 – 2 missing SNP genotypes.

Analysis of Duplicates Genotyping Quality Summary 1. Very high quality genotyping based on non-blinded and blinded concordance rates. For samples that were successful, overall rate of concordance > 99.996% 2. 3 duplicate pairs are suspect and require further investigation. These are not samples for which there is a reported versus genetic sex discrepancy. 3. Duplicate data also allows 163 missing genotypes to be resolved

Initial Processing • Tests for Hardy-Weinberg Equilibrium • For each SNP; maintain all in analyses • Deviations from HWE • Genotyping errors • Deviations from underlying assumptions • No mutation, migration, selection • Small sample size (genetic drift) • True association with phenotype • Estimates of Linkage Disequilibrium • D’ and r2 • Examination of LD block structure

Data Analyses • Identify key phenotypes for analysis • Transformations provided by Coordinating Center • Transfer of data from Coordinating Center to WFU • Initial data analyses using pre-specified models • Ethnic-specific (phenotype=SNP + age + sex + center) {4 ethnic groups} • Total MESA (phenotype=SNP + age + sex + center/ethnic) • Within each of the 5 sets of analyses for each SNP (and each phenotype), perform generalized test of association (2 df) and model SNP as dominant, additive and recessive

Candidate Gene Data Use and Publication Approach • Moratorium on candidate gene manuscript proposals (~3 months) • MESA and MESA Family Study investigators surveyed concerning gene and phenotype interests • If several interested in same gene and phenotype, P&P Committee will encourage the development of writing groups: • To delineate the hypotheses and models • To recommend number of manuscripts • To recommend authors • Should a candidate gene have no ‘champion’, it will be assigned to Investigator who recommended it

Visualization of Results • Summarized tables/display for initial results • For each SNP • 4 genetic models (generalized, dom, add, rec) • 5 sets of data (all MESA, 4 ethnic-specific) • All priority phenotypes • For multiple SNPs (haplotypes) • Summarized tables/displays for establishing writing priorities and formation of writing groups • Site for revised models/analyses • Links established by Coordinating Center, based upon approved manuscript proposals (P&P) through ‘Gene Pages’ that are password protected

Gene Pages • Establish a ‘portal of gene pages’ • Hosted by Coordinating Center & WFU & CSMC • Candidate gene on each ‘page’ • Gene symbol (e.g., OLR1) • Gene name (Low Density Lipoprotein, Oxidized, Receptor 1) • Genome location (12p13-p12) • Description of function (the OLR1 gene encodes a cell-surface endocytosis receptor for oxidized low density lipoprotein (OxLDL). LDL is oxidized in vascular endothelial cells to a highly injurious product that results in endothelial cell injury, which is implicated in the development of atherosclerosis. Vascular endothelial cells also internalize and degrade OxLDL though the OLR1 receptor)

Links to Gene Pages • SNPs genotyped for the candidate • Basic information • SNP allele frequency • HWE • LD panel • D’ and r2 for SNPs in gene by ethnicity • Haploview representation • Link to permit data download • Link to initial data analyses

MESA Family Genetics Committee