EMBL Systems Microscopy Meeting
250 likes | 389 Vues
EMBL Systems Microscopy Meeting. Bernd Fischer. rhdf5. R-package rhdf5 will appear tonight on http://www.bioconductor.org Easy to install on all platforms (windows, linux, MacOS) No need to install additional hdf5 libraries Ability to read (and write) HDF5 files from cellcognition in R.
EMBL Systems Microscopy Meeting
E N D
Presentation Transcript
EMBL Systems Microscopy Meeting Bernd Fischer
rhdf5 • R-package rhdf5 will appear tonight on http://www.bioconductor.org • Easy to install on all platforms (windows, linux, MacOS) • No need to install additional hdf5 libraries • Ability to read (and write) HDF5 files from cellcognition in R
What makes us different? • Genome-wide association studies (GWAS) have been enormously successful to identify many new causal players • ... but have not produced models that can explain / predict complex phenotypes at the level of individuals • - Rare polymorphisms. • - Genetic interactions, epistasis: “system is more than the sum of its parts”.
Design of Combinatorial knock-down Screen • 1367 x 72 Drosophila genes • each targeted by two independent dsRNA designs • 1456 plates (=559.104 wells) • ~100.000 distinct gene pairs
Imaging • Cells grow in incubator for five days • Fixate and stain with DAPI, pH3, and a-Tubulin • Image with automated fluorescence microscope Joint Work with Thomas Horn, Thomas Sandmann, Michael Boutros, DKFZ
Image Processing • Segmentation of Nuclei (in DAPI and pH3 channel) • Propagation of Segmentation from nucleus to cell body (region growing in a-Tubulin channel) • Extraction of features (intensity, area, texture, …) per cell, summary per well (mean, sd, quantiles, local cell density).
Estimating Genetic Interactions • For many phenotypes, the main effects (single gene) are multiplicative for non interacting genes i, j: • Additive on logarithmic scale • Estimation of main effects (assume that interactions are rare) • Detect Genetic Interactions: Compare to (t-test) effect of control main effect of dsRNA j error term interaction term 0, for non interacting genes ≠0, for interacting genes measurement (nr cells, growth rate, …) main effect of dsRNA i 8 23/10/2014
Interaction Matrix query genes x phenotypes template genes
Correlation of Interaction profiles preservedover phenotypes Correlation of interaction profiles Interaction
Classification of GO categories The interaction profile of each gene is used as a feature vector for machine learning Query genes are regarded as perturbations Trained classifier for 3311 categories (GO, Reactome, Interpro, …) 284 classifier showed good performance by cross validation (recall > 0.5 at precision 0.5)
Multiple Classification of genes blue: predicted, but not annotated yellow: annotated and predicted gray: annotation, but not predicted GO terms genes
Example APC ana phase promoter complex (APC) Direct inhibition of APC
Matrix Factorization predicted class probability (W) 284 classes genes PI: n template genes x m interaction features W: n template genes x k classes H: k classes x m interaction features n = 1367 template genes m = 720 (= 72 query genes * 10 phenotypes) k = 284 classes
Matrix Factorization class specific interaction profiles (H) 284 classes 72 query genes x 10 phenotypes Matrix factorization
Residuals (Semi-Supervised bi-clustering) Further matrix factorization to extract novel complexes/pathways
Modeling Genetic Interaction Data • View 1: Query genes are perturbations. => Cluster template genes to obtain functional information • View 2: phenotypic measurement in 100 000 different combinatorial genetic backgrounds => genotype to phenotype prediction
Network learningidentify the underlying molecular modules and their relation area number of cells phenotype (observed) activity of core modules (e.g. complexes, ‘path-ways’, mRNA or protein expression) (hidden) genetic perturbation state of each single experiment (observed)
EM algorithm E-Step: Compute expected activity levels on hidden nodes, given network structure and parameter => (loopy) belief propagation M-Step: Estimation network structure and regression parameter by maximizing expected likelihood area number of cells phenotype (observed) activity level (hidden) genetic perturbation (observed)
Summary • rhdf5 is out • Semi-supervised bi-clustering for genetic interaction data • Network learning to obtain causal network • General applicability to high dimensional phenotype data • Acknowledgement: Thomas Horn, Thomas Sandmann, Maximilian Billmann, Michael Boutros, Wolfgang Huber Huber group