Discovery of (new) phenotypes in a genome-wide RNAi HeLa cell imaging screen

Discovery of (new) phenotypesin a genome-wide RNAi HeLa cell imaging screen Gregoire Pau, Oleg Sklyar, Wolfgang Huber EMBL-EBI Cambridge Florian Fuchs, Michael Boutros DKFZ Heidelberg

Experimental setup • Genome-wide cell array screen with HeLa cells • Seeded, incubated for ~48h and stained with 3 markers • ~18000 genes knockdown (1 gene  1 well) • Actin (TRITC) • Tubulin (Alexa 488) • DNA (Hoechst) Florian Fuchs, Michael Boutros DKFZ Heidelberg

Gene phenotypes • Gene phenotype: phenotype expressed by a population of cells • Gene phenotype ≠cell phenotype ! • Examples: • No phenotype (observed on negative control empty wells)

Gene phenotypes • Examples • Apoptotic phenotype (observed on a COPB well) • Elongated phenotype (well LOC51693)

Cell phenotypes • Frequently observed cells • But they are many other more ! O Interphase O Mitotic O Dead cell

Goal Find new gene phenotypes "Given an input phenotype, how close is it to a known gene phenotype ?" ? Input well image No phenotype Apoptotic Elongated

Probabilistic point of view • Let denote the features of a cell i by Xi where Xi Rp • Each cell has p features: • Cell size, nucleus-to-cell size ration, nucleus eccentricity… • Actin Haralick moment, total tubulin, nucleus-to-cell actin ratio… • A gene phenotype is then characterized by a m.v. distribution Z • Where a realization is a set of n cells (X1,…,Xn) drawn from Z Cell feature 2 (X*2) Z Cell Cell feature 1 (X*1)

Models • Outlier detection problem • Given n cells (X1,…,Xn) where Xi Rp • How good are they fitting to a phenotype distribution (model) Z ? • Requires the estimation the density of Z (few samples, n≈p, hard !) • Requires a m.v. goodness-of-fit test (hard ?!) • Hard ! • Different workarounds • Shrinking (by binning) the space Rp by defining K cell classes • Z could be modeled by a simpler (and tractable) distribution

Defining cell classes • Defining K classes (here, K=3) • Counting the number of cells belonging to classes • Classical approach, robust • Needs a good priori biological knowledge • Adapted to clustering but maybe not to novelty detection Cell feature 2 (X*2) Cell feature 2 (X*2) Cell feature 1 (X*1) Cell feature 1 (X*1) O Interphase O Mitotic O Dead cell

Modeling Z • Assuming the phenotype distribution Z is known • Assuming a set of n cells (x1,…,xN) • P(X1=x1,… XN=xN) can be computed • Cells features Xi are independent • Two models: • Z is a normal distribution • Z is a mixture of 3 normal distributions

First model: Z is normal • Independence and normal assumption • A = log(P(X1=x1,… XN=xN)) = i log(p(Xi=xi)) • A is the log-probability that the cells features are similar to Z • Here Z is the distribution of the 'no phenotype' phenotype • Goal: Finding phenotypes far away from the 'no phenotype' • p(X=x)= N(X,X) can be easily estimated on a training set of wells showing no phenotype

Result • Using p=5 dimension cell features • Geometric: nucleus to cell size ratio, cell size, cell eccentricity • Protein: nucleus-to-cell actin ratio, nucleus-to-cell intensity ratio • Log-probability A can be computed on every well (~17000) • Sorting the lowest values Ai • Gives wells with some bluish dead cells, with very low p(X=x), which 'spoil' the sum lp Boring phenotypes: too close to the 'no phenotype'

Workarounds • Naïve solutions ? • Trimming: A', keeping only the 50 % interquantile p(X=x) values • Median: using A''=mediani(log(p(Xi=xi))) • Sorting the lowest A'', 5 new phenotypic classes can be found: • Condensed phenotype • Elongated phenotype • Bi-nucleated phenotype • 'Large cells' phenotype • 'Densely packed small cells' phenotype

Results • Condensed phenotype • Elongated STK39 TENC1 Curly shaped cells LOC51693 KCNT1

Results • Binucleated • Large cells phenotype KIAA0363 ADRB2

Results • Densely packed cells phenotype (empty spot) Artefact ? AFAR3

Note • Cells features • A = log(P(X1=x1,… XN=xN)) = i log(p(Xi=xi)) • A is the log-probability that the cells features are distributed in the same way than the model phenotype • Cell numbers • The number of cells N also can be a discriminating factor ! • Example: in an apoptotic phenotype • B = log(P(N=n)) is easy to compute • But how to combine A and B into a 'global outlier' score ?

Second model: Z is a mixture of 3 normal • Previous model was a coarse approximation • Normal assumption: 'no phenotype' population cells exhibit at least 3 different cell phenotypes (mitotic, interphase and dead cells) • New model • Z is a mixture of 3 normal distributions O Interphase O Mitotic O Dead cells

Model • Density of a cell feature X • P(X=x) = (1- M- D)fI(x) + MfM(x) + DfD(x) • Where M, D are the mixture components of mitotic and dead cells • Where fI, fM and fD are the normal densities of components • Fitting X on a phenotype • Gives A, B but also the mixtures M, D • Can they be used as discriminative parameters ? • Approach similar to the definition of cell classes ? • How to combine A, B M and D to a global 'outlier' score ? • Ongoing work… • … not yet !

Conclusion • Probalistic approach • Suitable for novelty detection • Even using Normal model lead to several phenotype discoveries • May not be extended to a clustering approach • Ongoing work • Results using the 3-component mixture model should be promising • … no ready yet !

Discovery of (new) phenotypes in a genome-wide RNAi HeLa cell imaging screen

Discovery of (new) phenotypes in a genome-wide RNAi HeLa cell imaging screen

Presentation Transcript

Discovery and Settlement of the New World

European Exploration and the Discovery of America

The Age of Discovery

Considering New Discovery Layers

Entrepreneurial discovery.

Resourcing the New Area of Study - Discovery

New Plant Discovery!

Age Of Discovery- Columbian Exchange

Age of Discovery (1500-1750)

The new Discovery Platform

Discovery-4

Discovery-12

The discovery of two new satellites of Pluto

Discovery Education

Discovery Center: Old and New

The Age of Discovery

Corps of Discovery

Discovery of New land

New forms of discovery The academic perspective

Age of Discovery

European Exploration and the Discovery of America