Fine recombination mapping in S. cerevisiae using tiling microarrays

Fine recombination mapping in S. cerevisiae using tiling microarrays Richard Bourgon 20 September 2007 bourgon@ebi.ac.uk

Meiotic recombination Meiosis… • Two divisions, yielding 4 haploid daughter cells. • Double-stranded breaks (DSBs) initiate recombination. • Both crossover and non-crossover resolutions are possible.

Meiotic recombination Meiosis… • Two divisions, yielding 4 haploid daughter cells. • Double-stranded breaks (DSBs) initiate recombination. • Both crossover and non-crossover resolutions are possible. Based on B de Massy, TRENDS in Genetics 19(9), 2003

Research questions • In S. cerevisiae, where do hotspots occur and what are the local recombination rates? • Do hotspots (binary) or recombination rates (quantitative) correlate with features of DNA sequence or chromatin structure? • How large are conversion regions? Can we identify the various resolution patterns? • What is the relative frequency of the various DSB-repair pathways? • Does the observed pattern among recombination events concur with current models for “interference”? Do mutations impact interference?

Saccharomyces cerevisiae microarray data • Two strains • S96, isogenic to the common laboratory strain S288c. • YJM789, isogenic to the clinical isolate YJM145. • In alignable regions ≈ 56,000 SNPs, 30,000 insertions and deletions. • Tiling microarrays • ≈ 6.5 M 5µ features, tiling non-repetitive S96 every 4 bases. • ≈ 4% of probes are specific to YJM789 sequence. • Data • 25 parental genomic DNA hybridizations. • 208 wildtype offspring hybridizations. • 20 msh4 mutant offspring hybridizations.

“Single feature polymorphisms” • Hybridization efficiency depends on the number and position of mismatches. • Differential hybridization provides a means of detecting polymorphisms, even when only the reference genome sequence is known: SFPs. • Winzeler et al., Science 281(5380), 1998. • Brem et al., Science 296(5568), 2002. • Steinmetz et al., Nature 416(6878), 2002. • Borevitz et al., Genome Research 13(3), 2003. • Given parental behavior, genotype segregants via supervised classification.

Single-probe methods • Polymorphism detection • Winzeler et al. (and others): ANOVA testing 1 = 1. • Borevitz et al.: moderated t-test using the SAM adjustment. • Brem et al.: moderated t-test. Then cluster all data (parental and segregant) and discard SFPs for which clusters don’t separate the parental data. • Segregant genotyping • ANOVA and t-test methods use the estimated posterior probability of class membership, with a uniform prior on the classes: • Brem et al. augment this: are estimated from clustered data.

S96: CCTCCTGACCGGGATTGAAGTGATAAACATGTCTAGCGTTA YJM789: CCTCCTGACCGGGATTGAACTGATAAACATGTCTAGCGTTA Probe sets: SNP interrogation 6: CTTCACTATTTGTACAGATCGCAAT Probe sets: groups of probes which each exactly map to a unique location, and which interrogate a common polymorphism. 5: CTAACTTCACTATTTGTACAGATCG 4: GGCCCTAACTTCACTATTTGTACAG 2: GACTGGCCCTAACTTCACTATTTGT 1: GGAGGACTGGCCCTAACTTCACTAT 3: GACTGGCCCTAACTTGACTATTTGT

Marginal probe behavior

A multi-probe method: SNPScanner Gresham et al., Science 311(5769), 2006: • Model the decrease in a given probe’s intensity in the presence of a single SNP, as a function of • Position within the probe, • Probe response to reference sequence, • Probe GC content, and • Nucleotides surround the SNP position. • Fit model parameters using two sequenced strains with known SNPs. • To genotype a segregant or new strain at a given base, assume probes in a probe set are independent and compute a likelihood ratio: vs. with assumed to be common for both genotypes.

An alternative multi-probe method • Residual correlation remains after centering log intensities for each probe within inferred genotype class: a multivariate approach is justified.

An alternative multi-probe method • Residual correlation remains after centering log intensities for each probe within inferred genotype class: a multivariate approach is justified. • Parental arrays are informative, but do not always provide an ideal model. Supervised classification of offspring (i) wastes information, and (ii) may be misleading.

An alternative multi-probe method • Residual correlation remains after centering log intensities for each probe within inferred genotype class: a multivariate approach is justified. • Parental arrays are informative, but do not always provide an ideal model. Supervised classification of offspring (i) wastes information, and (ii) may be misleading. • Clear division into two distributions is necessary but not sufficient. Quantitative aspects of the inferred clusters are useful.

Semi-supervised, model-based clustering (SSC) Semi-supervised clustering via EM algorithm: • Assume a two-component mixture, with 1 = 2 = 1/2. • (Xi,Yi) with latent class variable Y. Y is known for parental arrays. • Assume X|Y multivariate normal. • Begin with E-step: initialize the unknown Y with any simple clustering scheme: k-means, hierarchical agglomeration, etc. • Iteratively estimate parameters, E(Yi|Xi), parameters, etc. • Classify segregant i based on final estimated E(Yi|Xi). For diagnostic purposes only: • Multivariate Gaussian fit to dimension-reduced parental data. • Unsupervised clustering of offspring data, by EM algorithm, with k{2,3}.

Examples of SSC probe set results

Chromosome-level SSC results

Filtering • Array level • Excess “genotype switching”. • Large RMS residual (Mahalanobis). • Probe set level • High estimated misclassification rate. • Aberrant cluster behavior. • Very unusual genotype ratio. • Call level • Intermediate posterior probability of class membership.

Aberrant probe sets: non-response

Aberrant probe sets: possible cross-hybridization?

Filtering

Chromosome-level SSC results (unfiltered)

Chromosome-level SSC results (filtered)

Genotyping accuracy • 82 usable forward sequencing runs. (Reverse similar.) • 16 spores sequenced. • Sequenced regions include 322 array-interrogated SNPs. • Sequenced samples had a range of array qualities. • Sequenced regions focused on single-marker conversions with a range of probe set quality scores.

SNPScanner: approximately correct distributions

SNPScanner: wrong distributions, right calls

SNPScanner: inaccurate covariance estimation

Genotype call comparison: SNPScanner vs. SSC • Filter for both methods: • Remove probe bad arrays and aberrant probe sets. • Remove probe sets with poorly separated clusters. • Drop calls falling between two observed clusters. • Only consider polymorphisms with at least one S288c-specific probe. • Compute concordance rate between the two methods.

SSC calls

SNPScanner calls

Disagreement

PC plots for probe sets with strong disagreement

Usable probes per polymorphisms

Tetrad-level results

High resolution in crossover regions

Crossovers accompanied by events on other strands

Double crossovers

Msh4 mutant

Recombination rates

Summary and future work • Summary • Semi-supervised clustering out-performs supervised classification: • Parental data are often not a faithful indicator of offspring behavior. • Offspring clusters contain a lot of information. • Filtering is important for small event detection: • Aberrant or error-prone probe sets create spurious small “events” • Correct distribution estimates are required to detect the latter. • Future work • Exploration and recovery of aberrant probe sets. • Unanticipated polymorphism detection. • Application to a single sequenced genome. • Rate/count adjustments given varying marker spacing. • Hotspots, conversion/crossover ratio, sizes, spacing and interference. • New mms4 mutant…

Acknowledgements • EMBL Heidelberg • Eugenio Mancera Ramos • Lars Steinmetz • Julien Gagneur • Zhenyu Xu • EBI • Wolfgang Huber • EBI, and Istituto Europeo di Oncologia, Milan • Alessandro Brozzi • EBI, and Higgins Lab, University College, Dublin • Paul McGettigan

Fine recombination mapping in S. cerevisiae using tiling microarrays

Fine recombination mapping in S. cerevisiae using tiling microarrays

Presentation Transcript

Fine mapping QTLs using Recombinant-Inbred HS and In-Vitro HS

Linkage, Recombination and Eukaryotic Mapping

Linkage, Recombination and Eukaryotic Mapping

S. cerevisiae 2.0

Expression Profiling Using DNA MicroArrays

Fine-mapping of QTL using high-density SNP genotypes

EM Optimization using Coarse and Fine Mesh Space Mapping

Lecture 7: Recombination mapping

Auxotrophic Mutations of S. cerevisiae

Fine mapping of recombination in S. cerevisiae

Fine Mapping of Complex Traits in Yeast: Mapping Meiotic Recombination across the Genome

RAD54 Primers ( S. cerevisiae gene)

FINE SCALE MAPPING

Meiosis, Recombination and Mapping

Analysis of genetic interaction networks in S. cerevisiae

Meiotic recombination mapping with tiling microarrays: genotype and rate inference

Fine mapping of recombination in S. cerevisiae

Epistasis Analysis Using Microarrays

Recombination and Mapping (cont ’ d)

S. cerevisiae 2.0

Recombination Frequency and Gene Linkage Mapping

Meiosis, Recombination and Mapping