1 / 28

Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms

Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms. Jan Komorowski and Astrid Lägreid. Joint work with. Torgeir R. Hvidsten, Herman Midelfart, Astrid Lægreid and Arne K. Sandvik. Selected Challenges in Gene-expression Analysis.

zonta
Télécharger la présentation

Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms Jan Komorowski and Astrid Lägreid

  2. Joint work with • Torgeir R. Hvidsten, Herman Midelfart, Astrid Lægreid and Arne K. Sandvik J. Komorowski and A. Lägreid

  3. Selected Challenges in Gene-expression Analysis • Function similarity corresponds to expression similarity but: • Functionally corelated genes may be expression-wise dissimilar (e.g. anti-coregulated) • Genes usually have multiple function • Measurements may be approximate and contradictory • Can we obtain clusters of biologically related genes? • Can we build models that classify unknown genes to functional classes, that are human legible, and that handle approximate and often contradictory data? • How can we re-use biological knowledge? J. Komorowski and A. Lägreid

  4. Data • Data material • Serum starved fibroblasts, 8,613 genes • Added serum to medium at time = 0 • Used starved fibroblasts as reference • Measured gene activity at various time points • 493 genes found to be differentially expressed • Results • 278 genes known (3 repeats) • 212 genes unknown, (uncharacterized) • 211 genes given hypothetical function with 88% quality J. Komorowski and A. Lägreid

  5. 0 1 4 8 24 quiescent non-proliferating proliferating Fibroblast - serum response samples for microarray analysis serum J. Komorowski and A. Lägreid

  6. 0 1 4 8 24 quiescent non-proliferating proliferating Processes re-entry cell cycle stress response protein synthesis organelle biogenesis transcription cell motility lipid synthesis J. Komorowski and A. Lägreid

  7. 0 1 4 8 24 quiescent non-proliferating proliferating Dynamic processes delayed immediate early late immediate early intermediate primary secondary tertiary J. Komorowski and A. Lägreid

  8. 0 1 4 8 24 quiescent non-proliferating Protein appears after the transcript primary secondary tertiary proliferating J. Komorowski and A. Lägreid

  9. 0 1 4 8 24 Protein dynamics are not always similar to transcript dynamics gene transcript protein J. Komorowski and A. Lägreid

  10. Molecular mechanisms of transcriptional response serum = signal effectors = cellular response secondary transcription factors immediate early response factors intermediate/late response genes delayed immediate early response genes immediate early response genes J. Komorowski and A. Lägreid

  11. The dynamics of cellular processes stress response cell motility cell adhesion DNA synthesis energy metabolism protein synthesis cell cycle regulation 1 4 8 24 DNA synthesis cell motility lipid synthesis cell proliferation, negative regulation quiescent non-proliferating proliferating J. Komorowski and A. Lägreid

  12. Methodology 1. Mining functional classes from an ontology 2. Extracting features for learning 3. Inducing minimal decision rules using rough sets 0 - 4(Increasing) AND 6 - 10(Decreasing) AND 14 - 18(Constant) => GO(cell proliferation) ! 4. The function of unknown genes is predicted using the rules J. Komorowski and A. Lägreid

  13. Gene Ontology J. Komorowski and A. Lägreid

  14. Biological processes from GO Amino acid and derivative metabolism Protein targeting Energy pathways DNA metabolism Lipid metabolism Transport Ion hemostasis Intracellular traffic Organelle organization and biogenesis Cell death Cell motility Stress response Cell surface receptor linked signal transduction Oncogenesis Cell cycle Cell adhesion Intracellular signaling cascade Developmental processes Blood coagulation Circulation J. Komorowski and A. Lägreid

  15. Hierchical Clustering of the Fibroblast Data It’s not a cluster! J. Komorowski and A. Lägreid

  16. Gene Ontology vs. Clusters found by Iyer et al. J. Komorowski and A. Lägreid

  17. Template-based feature synthesis 12 measurement points, 55 possible intervals of length >2 J. Komorowski and A. Lägreid

  18. Examples of template definitions J. Komorowski and A. Lägreid

  19. Rule example 1 J. Komorowski and A. Lägreid

  20. Rule example 2 J. Komorowski and A. Lägreid

  21. Classification using template-based rules IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF 0 - 4(Constant) AND 0 - 10(Increasing) THEN GO(prot. met. and mod.) OR … IF … THEN IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … … +4 Votes are normalized and processes with vote fractions higher than a selection-threshold are chosen as predictions J. Komorowski and A. Lägreid

  22. Cross validation estimates Iyer et al. A: Coverage: 84% Precision: 50% B: Coverage: 71% Precision: 60% C: Coverage: 39% Precision: 90% Coverage = TP/(TP+FN) Precision = TP/(TP+FP) J. Komorowski and A. Lägreid

  23. Cross validation estimates Cho et al. Coverage: 58% Precision: 61% Coverage = TP/(TP+FN) Precision = TP/(TP+FP) J. Komorowski and A. Lägreid

  24. Protein Metabolism and Modification A B C D E A – annotations B – false negatives C – false positives D – true positives E – pred. unknown gene J. Komorowski and A. Lägreid

  25. Re-classification of the Known Genes J. Komorowski and A. Lägreid

  26. Co-classifications for the Unknown Genes J. Komorowski and A. Lägreid

  27. Conclusions • Our methodology • Incorporates background biological knowledge • Handles well the noise and incompleteness in the microarray data • Can be objectively evaluated • Predicts multiple functions per gene • Can reclassify known genes and provide possible new functions of the known genes • Can provide hypotheses about the function of unknown genes • Experimental work needs to be done to confirm our predictions J. Komorowski and A. Lägreid

  28. Genomic ROSETTA:http://www.idi.ntnu.no/~aleks/rosetta J. Komorowski and A. Lägreid

More Related