1 / 68

Bayesian Functional Mapping of Complex Dynamic Traits

Bayesian Functional Mapping of Complex Dynamic Traits. Tian Liu Genome Institute of Singapore Email: liut2@gis.a-star.edu.sg. Outline. Genetic Mapping of Quantitative Traits Functional Mapping of Dynamic Traits A General Bayesian Framework for Functional mapping

louvain
Télécharger la présentation

Bayesian Functional Mapping of Complex Dynamic Traits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Functional Mapping of Complex Dynamic Traits Tian Liu Genome Institute of Singapore Email: liut2@gis.a-star.edu.sg

  2. Outline • Genetic Mapping of Quantitative Traits • Functional Mapping of Dynamic Traits • A General Bayesian Framework for Functional • mapping • Bayesian Functional Mapping of Epistatic Quantitative • trait Loci • Simulation Studies • Summary and Prospects

  3. Many Traits Vary in a Quantitative Way • Morphology (height, size, body weight …) • Diseases (cancer, AIDS, …)

  4. Detected genes

  5. Marker 1 Marker 2 QTL Marker 3 . . . Marker k Quantitative Trait Loci (QTL) r1 • QTL are specific genetic locithataffect quantitative traits. • QTL can be detected by markers that are linked with it. r2 Two major tasks of QTL mapping: • Identify the location of the QTL • Estimate the genetic effects of the QTL

  6. Statistical Approaches For QTL Mapping • Single marker analysis (regression approach). • Interval mapping (as in a hallmark paper by Lander and Botstein, 1989). • Composite interval mapping – combination of interval mapping and partial regression (Zeng, 1994). • Bayesian QTL mapping (Satagopan and Yandell, 1996; Heath, 1997; Uimari and Hoeschele, 1997; Stephens and Fisch, 1998).

  7. Many Quantitative Traits have Dynamic Patterns

  8. QQ Qq Assessing the interplay between gene action and development Functional Mapping

  9. Functional Mapping – A mathematical and statistical framework for detecting genetic variants that control dynamic traits (Ma et al., 2002;Ma et al., 2004; Wu et al., 2004). • Statistically, functional mapping is a problem of jointly modeling the mean-covariance structures in longitudinal study. Functional Mapping Advantages: • Functional mapping integrates biological principles into the estimation process of QTL parameters, with results closer to biological reality. • Functional mapping models the mean-covariance structures, leading to increased power for QTL detection.

  10. Interval Mapping Functional Mapping H0: μQQ = μQq = μqq • Functional Mapping H0: μQQ=μQa= μqq

  11. Data Structure: An F2 Population in Mice phenotypic values yi maker genotypes

  12. : QTL genotype frequencies y: observed phenotypic values at T time points : parameters specific to QTL genotype j : parameters common to all QTL genotypes : probability density specific to QTL genotype j e.g. Mixture Model for Functional Mapping • The likelihood is formulated in the form of a mixture model: • The EM algorithm is implemented to estimate the parameters.

  13. Limitations of Maximum Likelihood-Based Functional Mapping • There may be many local maxima for the likelihood. • There is a substantial computation load to perform significant tests and obtain confidence interval estimators. • Uncertainty about the number of QTL causes extra difficulty in model fitting and selection. • Parameter estimation can be intractable when nonlinear equations are used to model the mean-vectors.

  14. Bayes’ theorem (1763): • Intuitively, Bayes' theorem in this form describes the way in which one's beliefs about observing A are updated by having observed B. • Posterior distribution: From Frequentist to Bayesian http://en.wikipedia.org/wiki/Thomas_Bayes

  15. Thomas and Cortessis (1992); Satagopan and Yandell (1996); Heath (1997); Uimari and Hoeschele (1997); Stephens and Fisch (1998); Sillanpaa and Arjas (1998, 1999) … From Frequentist to Bayesian • Informative priors for some of the unknowns are given to improve parameter estimation. • Inference about the parameters is made directlyfrom the marginal posterior; and it is straightforward to obtain confidence interval estimators for the unknowns. • The enumeration of QTL can be determined by comparing the Bayes factors from separate models.

  16. A General Bayesian Framework for Functional Mapping • Trait of interest: growth curve. • Derive a procedure for Bayesian estimators of biological meaningful parameters that model mean-covariance structures. • Develop a general approach for the genome-wide enumeration of QTL. • Extend the model to understand epistatic interactions of QTL.

  17. (2) (2’) A Mixture Model But we don’t know the QTL genotypes A Linear Model If we knew the QTL genotypes… yi(t): phenotypic value of each individual i at time point t ξij: indicator variable for individual i to carry a QTL genotype j ( j = 1,…,3s; for an F2 population with s QTLs) uj(t): expected phenotypic value for QTL genotype j at time point t εi(t): ~ iidN(0, σ2(t))

  18. (3) (3) α: limiting value of growth α / (1+ β): initial value of growth γ: relative growth rate α: limiting value of growth α / (1+ β): initial value of growth γ: relative growth rate Parametric Modeling of the Mean Vector Logistic Growth Curve (von Bertalanffy et al., 1957) • The growth curve for QTL genotype j:

  19. (5) (4) Likelihood • Observed variables: y = (y1, … yn), , and • Unknown parameters: λ, Ω = (Ω1, … , Ω3s), and Σ. λ: QTL locations Ωj unknown parameters that determine the mean vector for QTL genotype j π(yi | Qi = j, Ω, Σ): ~ MVN (μj , Σ) : conditional probability of QTL genotype j given marker genotypes for individual i.

  20. Posterior density of , Ω = (Ω1, … , Ω3s),Q = {Qi}, and Σ: (6) Parameter Estimation where: π( y|Q, Ω, Σ) = π( yi|Qi=qi, Ωqi, Σ) π(Q | λ) = π(Qi| λ) and π( λ, Ω, Σ ) = only depends on {Mi}, and λ priors for the unknowns

  21. Choices of Priors λ: A noninformative prior should be chosen, π(λk) = uniform(0, Dm); k = 1, 2, …, s Ωj: Information about its prior can be obtained from previous studies, and an informative prior can improve the estimation, π(Ωj) = MVN ( η, Λ) Σ-1:A Wishart distribution with a low degree of freedom is often regarded as a noninformative prior for Σ-1. π(Σ-1) = Wishart ( R, T )

  22. Marginal posteriors of the unknowns can be obtained from the joint posterior by integrating over other unknowns. Parameter Estimation (cont’d) Evaluation of such a high dimensional integral is intractable ! MCMC techniques can be used to draw samples from the joint posterior. • Σ-1: Updated by Gibbs sampling • λ and Ωj : Updated by Metropolis-Hastings (M-H) algorithms

  23. Gibbs Sampling • Idea: a joint distribution may be hard to sample from, but it may be easy to sample from the conditional distributions where all variables are fixed except one • To sample from p(x1, x2, …xn), let each state of the Markov chain represent (x1, x2, …xn), the probability of moving to a state (x1, x2, …xn) is: • p(xi |x1, …xi-1,xi+1,…xn). Then the detailed balance is satisfied.

  24. Metropolis Algorithm Goal: Draw from a distribution π with support S. • Transitions have two parts: • proposal distribution: q(x(t+1)|x(t)) • acceptance: take proposals with probability • α(x(t),x(t+1)) = min( 1, ) π(x(t+1)) q(x(t)|x(t+1)) π(x(t)) q (x(t+1)|x(t))

  25. Full Conditional Posteriors (7) (8) :

  26. (Geman and Geman, 1984; Hastings, 1970) Construction of a Markov Chain • We make an inference about the unknowns from a random sequence of Markov chain samples, • Algorithm implementation: Step 1: Initiate the chain at any state which has a positive probability. Step 2: Modify the four blocks of unknowns at the current state , and move to a new state .

  27. the proposed distribution for λk ( k = 1,2,…,s ) • Step 2-1: • Generate λk* ~ Uniform ( max(λk-1, λk -δ), min(λk+δ, λk+1) ) • : = q(λk, λk* ) Updating λ by an M-H Algorithm 2. Accept λk* with probability min(α λ,k, 1). (9) min(αλ,k,1) λk is kept at its current value if the proposal is rejected.

  28. Updating QTL Genotypes Step 2-2: Q is updated by separately updating each Qi. Update Qiby directly sampling from its full conditional posterior. (10)

  29. the proposed distribution for Ωj ( j =1, 2, … , 3s) (11) Step 2-3: 1. Generate Ωj* ~ MVN(Ωj , Ψ) : = q(Ωj, Ωj * ) Updating Ωj by a Metropolis Algorithm 2. Accept Ωj* with probability min(αΩj, 1). Ωj is kept at its current value if the proposal is rejected.

  30. Updating Σ-1 by Gibbs Sampling Step 2-4: The full conditional posterior of Σ-1has an explicit form, we can update Σ-1by directly sampling from

  31. Structured antedependence (SAD) model (Nunez-Anton & Zimmerman, 2000) • where • The analytical forms for variance and covariance functions among equally spaced time-dependent measurements of SAD(1) (Jaffrezic et al., 2003) Structuring the Covariance Matrix

  32. Structuring the Covariance Matrix by the SAD Model • Priors for the innovation parameter ν2, and the antedependence parameter  • ; • Full conditional posterior distributions:

  33. Updating ν2: a new value of ν2 (ν2*) is generated from its proposal distribution , and is accepted with probability min(1, αν2 ), where • Updating ϕ: a new value of ϕ (ϕ*) is generated from its proposal distribution , and is accepted with probability min(1, α ϕ), where Updating ν2 andϕ by the M-H Algorithm

  34. Estimation Issues • Kernel density estimators (Fan and Gijbels, 1996) can be used to estimate the marginal posterior densities of the parameters. • According to Tierney (1994), empirical averages of the corresponding MCMC samples may be regarded as the consistent estimators for the unknown parameters. • Confidence intervals can be obtained from the highest posterior density (HPD) regions (Box and Tao, 1973).

  35. How Many QTL? • The number of QTL can be determined by first fitting different models (s = 0, 1, 2, …), and then comparing them with Bayes factors. • Bayes Factor (BF) isdefined as the ratio of marginal probabilities of y given the two models,

  36. Rule of Thumb • With this setup, if we interpret model 1 as the null model, then: • If B(x)  1 then model 1 is supported • If 1 > B(x)  10-1/2 then minimal evidence against model 1. • If 10-1/2 > B(x)  10-1 then substantial evidence against model 1. • If 10-1 > B(x)  10-2 then strong evidence against model 1. • If 10-2 > B(x)  then decisive evidence against model 1.

  37. Bayesian Hypothesis Testing and Bayes Factors Bayesian p-values Bayes Factors for model comparison Easy to implement alternatives for model comparison

  38. The harmonic mean of the likelihood values (Newton and Raftery, 1994) can be used to estimate : • The LOD score can be interpreted in terms of the Bayes factors (Kass and Raftery, 1995): How Many QTL? (cont’d) (14) (15)

  39. A Worked Example

  40. Study Design • Cheverud et al. (1996) and Vaughn et al. (1999) constructed a linkage map with 96 microsatellite markers for 1043 F2 mice, initiated with a Large and Small strains • The F2 progeny was measured for their body weights weekly for 10 consecutive weeks, starting from age 7 days.

  41. λk: A noninformative prior should be chosen, π(λk) = uniform(0, Dm) Ωj: information about its prior can be obtained from previous studies; and an informative prior for Ωj is given by, π(Ωj) = MVN ( η, Λ) where η = (30, 10, 0.6)´ andΛ= diag(10, 4, 1). Σ-1:π(Σ-1) = Wishart (R, T ) = Wishart (S-1, 10 ) Prior Distributions

  42. Results by Fitting a Single-QTL Model • Burn-in length: 10,000 • Every 60th MCMC sample was retained from the next 60,000 samples, resulting in a working set with 1,000 MCMC samples • QTL were detected on both chromosome 6, 7, and 10.

  43. Estimated Marginal Posteriors of QTL Locations Zhao et al, (2005)

  44. Results by Fitting a Single-QTL Model

  45. Bayesian estimates of genotype-specific growth curves for three QTL detected on chromosomes 6, 7, and 10 Results by Fitting a Single-QTL Model

  46. Bayesian Functional Mapping of Epistatic Quantitative Trait Loci

  47. Epistasis • Epistasis is the genetic interaction between alleles at different genes. • Epistasis has been recognized to play a more important role in trait formation and development than previously appreciated. • Because epistasis is difficult to estimate, many earlier studies assume that there is no epistasis, thus producing biased inference about genetic control.

  48. Assume two QTL, Q (with two alleles Q and q) and W(with two alleles W and w). • The two QTL generate 9 genotypes as follows: Quantifying Epistasis Key issue: How to estimate iaa, iad, ida, and idd?

  49. additive additive interaction additive dominant interaction dominant additive interaction dominant dominant interaction Estimate Dynamic Changes of the Additive, Dominance, and Epistatic Effects between two QTL over all mean additive effect of QTL #1 additive effect of QTL #2 dominant effect of QTL #1 dominant effect of QTL #2

  50. Estimated Bayes Factors on a logarithmic scale between two different models. Model Selection

More Related