Identifying co-regulation using Probabilistic Relational Models - PowerPoint PPT Presentation

identifying co regulation using probabilistic relational models n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Identifying co-regulation using Probabilistic Relational Models PowerPoint Presentation
Download Presentation
Identifying co-regulation using Probabilistic Relational Models

play fullscreen
1 / 74
Identifying co-regulation using Probabilistic Relational Models
102 Views
Download Presentation
tom
Download Presentation

Identifying co-regulation using Probabilistic Relational Models

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Identifying co-regulation using Probabilistic Relational Models by Christoforos Anagnostopoulos BA Mathematics, Cambridge University MSc Informatics, Edinburgh University supervised by Dirk Husmeier

  2. General Problematic Promoter sequence data ...ACGTTAAGCCAT... ...GGCATGAATCCC... Bringing together disparate data sources:

  3. General Problematic Promoter sequence data ...ACGTTAAGCCAT... ...GGCATGAATCCC... Bringing together disparate data sources: Gene expression data gene 1: overexpressed gene 2: overexpressed ... mRNA

  4. General Problematic Promoter sequence data ...ACGTTAAGCCAT... ...GGCATGAATCCC... Bringing together disparate data sources: Gene expression data gene 1: overexpressed gene 2: overexpressed ... mRNA Protein interaction data protein 1 protein 2 ORF 1 ORF 2 -------------------------------------------------- AAC1 TIM10 YMR056C YHR005CA AAD6 YNL201C YFL056C YNL201C Proteins

  5. Our data Promoter sequence data ...ACGTTAAGCCAT... ...GGCATGAATCCC... Gene expression data gene 1: overexpressed gene 2: overexpressed ... mRNA

  6. Bayesian Modelling Framework Bayesian Networks

  7. Bayesian Modelling Framework Conditional Independence Assumptions Factorisation of the Joint Probability Distribution Bayesian Networks UNIFIED TRAINING

  8. Bayesian Modelling Framework Probabilistic Relational Models Bayesian Networks

  9. Aims for this presentation: Briefly present the Segal model and the main criticisms offered in the thesis Briefly introduce PRMs Outline directions for future work

  10. The Segal Model Module 1 Module 2 Cluster genes into transcriptional modules... ? gene

  11. The Segal Model Module 1 Module 2 P(M = 1) P(M = 2) gene

  12. The Segal Model Module 1 How to determine P(M = 1)? P(M = 1) gene

  13. The Segal Model Motif Profile motif 3: active motif 4: very active motif 16: very active motif 29: slightly active Module 1 How to determine P(M = 1)? gene

  14. The Segal Model Predicted Expression Levels Array 1: overexpressed Array 2: overexpressed Array 3: underexpressed ... Motif Profile motif 3: active motif 4: very active motif 16: very active motif 29: slightly active Module 1 How to determine P(M = 1)? gene

  15. The Segal Model Predicted Expression Levels Array 1: overexpressed Array 2: overexpressed Array 3: underexpressed ... Motif Profile motif 3: active motif 4: very active motif 16: very active motif 29: slightly active Module 1 How to determine P(M = 1)? P(M = 1) gene

  16. The Segal model PROMOTER SEQUENCE

  17. The Segal model PROMOTER SEQUENCE MOTIF PRESENCE

  18. The Segal model PROMOTER SEQUENCE MOTIF MODEL MOTIF PRESENCE

  19. The Segal model MOTIF PRESENCE MODULE ASSIGNMENT

  20. The Segal model MOTIF PRESENCE REGULATION MODEL MODULE ASSIGNMENT

  21. The Segal model MODULE ASSIGNMENT EXPRESSION DATA

  22. The Segal model MODULE ASSIGNMENT EXPRESSION MODEL EXPRESSION DATA

  23. Learning via hard EM HIDDEN

  24. Learning via hard EM Initialise hidden variables

  25. Learning via hard EM Initialise hidden variables Set parameters to Maximum Likelihood

  26. Learning via hard EM Initialise hidden variables Set parameters to Maximum Likelihood Set hidden values to their most probable value given the parameters (hard EM)

  27. Learning via hard EM Initialise hidden variables Set parameters to Maximum Likelihood Set hidden values to their most probable value given the parameters (hard EM)

  28. Motif Model OBJECTIVE: Learn motif so as to discriminate between genes for which the Regulation variable is “on” and genes for which it is “off”. r = 1 r = 0

  29. Motif Model – scoring scheme high score: ...CATTCC... low score: ...TGACAA...

  30. Motif Model – scoring scheme high score: ...CATTCC... low score: ...TGACAA... high scoring subsequences ...AGTCCATTCCGCCTCAAG...

  31. Motif Model – scoring scheme high score: ...CATTCC... low score: ...TGACAA... high scoring subsequences ...AGTCCATTCCGCCTCAAG... low scoring (background) subsequences

  32. Motif Model – scoring scheme high score: ...CATTCC... low score: ...TGACAA... high scoring subsequences ...AGTCCATTCCGCCTCAAG... promoter sequence scoring low scoring (background) subsequences

  33. Motif Model SCORING SCHEME P ( g.r = true | g.S, w ) parameter set w: can be taken to represent motifs

  34. Motif Model SCORING SCHEME P ( g.r = true | g.S, w ) parameter set w: can be taken to represent motifs Maximum Likelihood setting Most discriminatory motif

  35. Motif Model – overfitting TRUE PSSM

  36. Motif Model – overfitting typical motif: ...TTT.CATTCC... TRUE PSSM high score

  37. Motif Model – overfitting typical motif: ...TTT.CATTCC... TRUE PSSM high score INFERRED PSSM Can triple the score!

  38. Regulation Model For each module m and each motif i, we estimate the association umi P ( g.M = m | g. R ) is proportional to

  39. Regulation Model: Geometrical Interpretation The(umi )i define separating hyperplanes Classification criterion is the inner product: Each datapoint is given the label of the hyperplane it is the furthest away from, on its positive side.

  40. Regulation Model: Divergence and Overfitting pairwise linear separability overconfident classification Method A: dampen the parameters (eg Gaussian prior) Method B: make the dataset linearly inseparable by augmentation

  41. Erroneous interpretation of the parameters Segal et al claim that: When umi = 0, motif iis inactive in module m When umi > 0 for all i,m, then only the presence of motifs is significant, not their absence

  42. Erroneous interpretation of the parameters Segal et al claim that: When umi = 0, motif iis inactive in module m When umi > 0 for all i,m, then only the presence of motifs is significant, not their absence Contradict normalisation conditions!

  43. Sparsity INFERRED PROCESS TRUE PROCESS

  44. Sparsity Reconceptualise the problem: Sparsity can be understood as pruning Pruning can improve generalisation performance (deals with overfitting both by damping and by decreasing the degrees of freedom) Pruning ought not be seen as a combinatorial problem, but can be dealt with appropriate prior distributions

  45. Sparsity: the Laplacian How to prune using a prior: choose a prior with a simple discontinuity at the origin, so that the penalty term does not vanish near the origin every time a parameter crosses the origin, establish whether it will escape the origin or is trapped in Brownian motion around it if trapped, force both its gradient and value to 0 and freeze it Can actively look for nearby zeros to accelerate pruning rate

  46. Results: generalisationperformance Synthetic Dataset with 49 motifs, 20 modules and 1800 datapoints

  47. Results: interpretability DEFAULT MODEL: LEARNT WEIGHTS TRUE MODULE STRUCTURE LAPLACIAN PRIOR MODEL: LEARNT WEIGHTS

  48. Regrets: BIOLOGICAL DATA

  49. Aims for this presentation: Briefly present the Segal model and the main criticisms offered in the thesis Briefly introduce PRMs Outline directions for future work

  50. Probabilistic Relational Models How to model context – specific regulation? Need to cluster the experiments...