1 / 77

Probabilistic modelling in computational biology

Dirk Husmeier. Probabilistic modelling in computational biology. Biomathematics & Statistics Scotland. James Watson & Francis Crick, 1953. Frederick Sanger, 1980. Network reconstruction from postgenomic data. Model Parameters q. Marriage between

lucy-burt
Télécharger la présentation

Probabilistic modelling in computational biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dirk Husmeier Probabilistic modelling in computational biology Biomathematics & Statistics Scotland

  2. James Watson & Francis Crick, 1953

  3. Frederick Sanger, 1980

  4. Network reconstruction from postgenomic data

  5. Model Parameters q

  6. Marriage between graph theory and probability theory Friedman et al. (2000), J. Comp. Biol. 7, 601-620

  7. Bayes net ODE model

  8. Model Parameters q Probability theory  Likelihood

  9. Model Parameters q Bayesian networks: integral analytically tractable!

  10. UAI 1994

  11. Identify the best network structure Ideal scenario: Large data sets, low noise

  12. Uncertainty about the best network structure Limited number of experimental replications, high noise

  13. Sample of high-scoring networks

  14. Sample of high-scoring networks Feature extraction, e.g. marginal posterior probabilities of the edges Uncertainty about edges High-confident edge High-confident non-edge

  15. Sampling with MCMC Number of structures Number of nodes

  16. Madigan & York (1995), Guidici & Castello (2003)

  17. Overview • Introduction • Limitations • Methodology • Application to morphogenesis • Application to synthetic biology

  18. Homogeneity assumption Interactions don’t change with time

  19. Limitations of the homogeneity assumption

  20. Example: 4 genes, 10 time points

  21. Supervised learning. Here: 2 components

  22. Changepoint model Parameters can change with time

  23. Changepoint model Parameters can change with time

  24. Unsupervised learning. Here: 3 components

  25. Extension of the model q

  26. Extension of the model q

  27. Extension of the model q Allocation vector h k Number of components (here: 3)

  28. Analytically integrate out the parameters q Allocation vector h k Number of components (here: 3)

  29. RJMCMC within Gibbs P(network structure | changepoints, data) P(changepoints | network structure, data) Birth, death, and relocation moves

  30. 2 Dynamic programming, complexity N

  31. Collaborationwith theInstitute of Molecular Plant Sciences at Edinburgh University (Andrew Millar’s group) Circadian rhythms in Arabidopsis thaliana - Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3 - Transcriptional profiles at 4*13 time points in 2h intervals under constant light for - 4 experimental conditions

  32. Comparison with the literature Precision Proportion of identified interactions that are correct Recall = Sensitivity Proportion of true interactions that we successfully recovered Specificity Proportion of non-interactions that are successfully avoided

  33. Which interactions from the literature are found? ELF3 True positive CCA1 True positives (TP) = 8 False negatives (FN) = 5 LHY PRR9 Recall= 8/13= 62% GI Blue: activations Red: Inhibitions TOC1 PRR5 PRR3 ELF4 False negative

  34. Which proportion of predicted interactions are confirmed by the literature? True positives (TP) = 8 False positives (FP) = 13 Precision = 8/21= 38% True positive Blue: activations Red: Inhibitions False positives

  35. Precision= 38% Recall= 62% ELF3 CCA1 LHY PRR9 GI TOC1 PRR5 PRR3 ELF4

  36. True positives (TP) = 8 False positives (FP) = 13 False negatives (FN) = 5 True negatives (TN) = 9²-8-13-5= 55 Sensitivity = TP/[TP+FN] = 62% Specificity = TN/[TN+FP] = 81% Recall Proportion of avoided non-interactions

  37. Model extension So far:non-stationarity in the regulatory process

  38. Non-stationarity in the network structure

  39. Flexible network structure .

  40. Model Parameters q

  41. Model Parameters q Use prior knowledge!

  42. Flexible network structure .

  43. Flexible network structure with regularization Hyperparameter Normalization factor

  44. Flexible network structure with regularization Exponential prior versus Binomial prior with conjugate beta hyperprior

More Related