150 likes | 251 Vues
This joint work presents high-resolution computational models of genome binding events, decoding accurate information from noisy ChIP-chip data to reveal protein-DNA binding patterns. The study utilizes Joint Binding Deconvolution with Bayesian Inference, Message Passing Algorithms, and Approximate Inference to improve predictive accuracy and identify binding sites with enhanced spatial resolution. Experimental results demonstrate the efficacy of the approach in motif discovery and resolving proximal binding events.
E N D
Yuan (Alan) Qi Joint work with Gifford and Young labs High-resolution computational models of genome binding events Dana-Farber Cancer Institute Jan 2007
ChIP-chip Experiments • ChIP-chip data: • Encode valuable information about protein-DNA binding events. • Goal: • Decode accurate binding information from the noisy data. • Challenges: • Noise • Joint influence of multiple binding events
Joint Binding Deconvolution Data Likelihood Prior Distributions: Hyper Prior Distributions: JBD: generative probabilistic graphical model.
Shear Distribution (b) An influence function is derived from the measured fragment size distribution. (a) The distribution of DNA fragment sizes produced in the ChIP protocol were experimentally measured and statistically modeled.
Approximate Bayesian Inference Exact Bayesian posterior of binding events: Where and Non-conjugate models, thousands of variables -> Intractable calculations of the exact posterior distribution! Message passing algorithm (Expectation propagation): EP iteratively refines the factor approximations (i.e., messages) to improve the posterior approximation.
EP in a Nutshell • Approximate a probability distribution by simpler parametric terms: • Each approximation term lives in an exponential family (e.g., Gaussian or Gamma distributions).
EP in a Nutshell Three key steps: • Deletion: Approximate the “leave-one-out” posterior distribution for the ithfactor. • Minimization: Minimize the following KL divergence by moment matching. • Inclusion:
Spatial resolution comparison between JBD and other methods • The average distance of JBD’s Gcn4 binding predictions to motif sites is smaller than for other methods, and JDB identifies more known Gcn4 targets.
JBD better resolves proximal binding events than do other methods. Shown here is performance of the JBD, MPeak and Ratio methods on 200 simulated DNA regions each containing two binding events.
Using binding posterior to guide motif discovery Approach: • Using binding posterior probabilities derived from the ChIP-chip data to weight sequence regions differently for motif discovery. Results: • Finding Mig2 motif while a standard motif discovery algorithm (e.g., MEME) failed. • Note that the correct motif for Mig2 was not recovered when using the Ratio method to analyze the ChIP-chip data.
Positional priors for motif discovery improve robustness to false input DNA sequence regions.