1 / 15

Multiple Species Gene Finding using Gibbs Sampling

Multiple Species Gene Finding using Gibbs Sampling. Sourav Chatterji Lior Pachter University of California, Berkeley. Multiple Species Comparative Gene Finding (with Alignment). McAuliffe et al. (2004), Siepel et al. (2004). Multiple Species Comparative Gene Finding (with Alignment).

akina
Télécharger la présentation

Multiple Species Gene Finding using Gibbs Sampling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Species Gene Findingusing Gibbs Sampling Sourav Chatterji Lior Pachter University of California, Berkeley

  2. Multiple Species Comparative Gene Finding(with Alignment) • McAuliffe et al. (2004), Siepel et al. (2004)

  3. Multiple Species Comparative Gene Finding(with Alignment) • McAuliffe et al. (2004), Siepel et al. (2004)

  4. Multiple Species Comparative Gene Finding(without Alignment)

  5. Gibbs Sampling for Biological Sequence Analysis • Introduced by Lawrence et al. 1993 • Motif Detection • Extensions • Multiple Motifs in a Sequence • Multiple Types of Motifs • Applications • Alignment • Linkage Analysis

  6. Gibbs Sampling • Aim : To sample from the joint distribution p(x1,x2,…,xn) when it is easy to sample from the conditional distributions p(xi | x1,…xi-1,xi+1,…,xn) but not from the joint distribution. • Method: Iteratively sample xit from the conditional distribution p(xi | x1t,…xi-1t,xi+1t-1,…,xnt-1) • Theorem : For discrete distributions, the distribution of (x1t,x2t …,xnt) converges to p(x1,x2,…,xn)

  7. Connection to HMMs qs qs qs Z2 Zm Z1 qt qt qt Ym Y1 Y2 • qt= output probabilities • qs= transition probabilities • Difficult to sample from P(q,Z | Y) • Easy to sample q from P(q | Z,Y) • Easy to sample Z from P(Z | q,Y)

  8. Gibbs Sampling for Gene Finding

  9. Gibbs Sampling for Gene Finding Initial Predictions

  10. Gibbs Sampling for Gene Finding Sample Z1 from P(Z1 | Z[-1] , Y)

  11. Gibbs Sampling for Gene Finding Sample Z2 from P(Z2 | Z[-2] , Y)

  12. Additional Details • Issues in the Gibbs Sampling Method • Gibbs sampling assumes sequences independently generated by a HMM: need to generalize method a tree topology. • Learn parameters from a subset of sequences roughly equidistant from each other: human, mouse, dog and cow • Things get messy when there are multiple genes; need to handle multiple set of parameters. • Make use of an approximate alignment • Boost scores using a phyloHMM model

  13. Results • 2060 exons predicted • Exon level Sensitivity : 23.2% • Exon level Specificity : 46.7% • 28.5% of predicted exons partially overlap with true exons. • Nucleotide Level Sensitivity : 42.8% • Nucleotide Level Specificity : 82.1%

  14. Results • Nucleotide level results much better than exon level results • Need of better splice site models, probably multiple species splice site models. • Low Sensitivity • Is it the alignment?

  15. Analysis of results (novel genes) • Statistics of transcripts overlapping with novel VEGA genes • 223 exons predicted • Exon level Sensitivity : 24.8% (78 of 315 true exons are predicted correctly) • Exon level Specificity : 35.0% (78 of the 223 predicted exons are correct) • Additionally, 24.7% of predicted exons partially overlap with the true exons. • Nucleotide level Sensitivity : 56.6% • Nucleotide level Specificity : 62.9%

More Related