Ju Han Imaging Informatics Lab Life Sciences Division http:

1. The title is molecular predictors of 3D morphogenesis by breast cancer cells in 3D culture. This work focuses on quantitative analysis of 3D morphology of breast cancer cell lines. The title is molecular predictors of 3D morphogenesis by breast cancer cells in 3D culture. This work focuses on quantitative analysis of 3D morphology of breast cancer cell lines.

2. Outline Motivation Experimental design Previous work Approach Results Summary In this presentation, we will start with the motivation, experimental design and previous work, followed by the details of the technical approach. We will discuss the experimental results and summary at the end of the presentation. In this presentation, we will start with the motivation, experimental design and previous work, followed by the details of the technical approach. We will discuss the experimental results and summary at the end of the presentation.

3. Motivation A panel of cell lines for analysis In this project, we use a panel of cell lines for analysis Why a panel of cell lines? First, it will introduce the necessary molecular diversity Second, it generates heterogeneous responses And it offers an improved model system for high-content screening, comparative analysis and cell systems biology In this presentation, we will focus on morphometric subtyping for a panel of breast cancer cell lines in identifying Subpopulations with similar morphometric properties And identifying molecular predictors for each subpopulations In this project, we use a panel of cell lines for analysis Why a panel of cell lines? First, it will introduce the necessary molecular diversity Second, it generates heterogeneous responses And it offers an improved model system for high-content screening, comparative analysis and cell systems biology In this presentation, we will focus on morphometric subtyping for a panel of breast cancer cell lines in identifying Subpopulations with similar morphometric properties And identifying molecular predictors for each subpopulations

4. Experimental design In this experiment, we have a panel of 24 breast cancer cell lines All 3D cell cultures were maintained for 4 days with media change every 2 days, And the samples were then imaged with phase contrast microscopy The computational pipeline includes the colony segmentation and representation, and the phenotypic clustering as well. Furthermore, we will find the molecular predictor for morphometric clusters, Or the molecular predictor of morphometric features For example, the colony sizeIn this experiment, we have a panel of 24 breast cancer cell lines All 3D cell cultures were maintained for 4 days with media change every 2 days, And the samples were then imaged with phase contrast microscopy The computational pipeline includes the colony segmentation and representation, and the phenotypic clustering as well. Furthermore, we will find the molecular predictor for morphometric clusters, Or the molecular predictor of morphometric features For example, the colony size

5. Previous work(Kenny et. al, Gene Ontology, 2007) Here is the previous work on the same data by Kenny and other people in Bissel Lab. The phase and fluorescence images of different cell lines are visually examined And mannually divided into 4 distinct classes ��.. However, the manual analysis is labor expensive and subject to user bias. So we have developed a computational protocol for quantitative analysis And automatic subtyping of these cell lines. Here is the previous work on the same data by Kenny and other people in Bissel Lab. The phase and fluorescence images of different cell lines are visually examined And mannually divided into 4 distinct classes ��.. However, the manual analysis is labor expensive and subject to user bias. So we have developed a computational protocol for quantitative analysis And automatic subtyping of these cell lines.

6. Automatic subtyping a panel of breast cancer cell lines in 3D culture Now we show the subtyping a panel of breast cancer cell lines in 3D culture Phase images were obtained from cell lines cultured in 3D Colonies were segmented from the background Morphogenesis indices were computed Similarly, we use consensus clustering for subtyping At the same time, gene expression data were obtained for each cell line Then, molecular predictors for phenotypic subpopulations and morphogenesis were discovered. We use phase image as an example here. This work could be extend to fluorescence images as well. Now we show the subtyping a panel of breast cancer cell lines in 3D culture Phase images were obtained from cell lines cultured in 3D Colonies were segmented from the background Morphogenesis indices were computed Similarly, we use consensus clustering for subtyping At the same time, gene expression data were obtained for each cell line Then, molecular predictors for phenotypic subpopulations and morphogenesis were discovered. We use phase image as an example here. This work could be extend to fluorescence images as well.

7. Colony segmentation and representation (phase images) Colonies are separated from the background based on texture features; Morphometric features (size and shape) are extracted for each colony. The images on the lest are colony images from different cell lines We can see that the intensity and gradient are heterogeneous in different colonies So simple thresholding methods may not work here. Instead, we segment the colony from the background through a clustering method based on texture features. The images on the right show segmented colonies Once the conloies are segmented, morphometric features, including size and shape, are extracted for each segmented colony. The images on the lest are colony images from different cell lines We can see that the intensity and gradient are heterogeneous in different colonies So simple thresholding methods may not work here. Instead, we segment the colony from the background through a clustering method based on texture features. The images on the right show segmented colonies Once the conloies are segmented, morphometric features, including size and shape, are extracted for each segmented colony.

8. Clustering of morphometric features Consensus clustering A proven method in analyzing gene expression data (Monti et. al, Machine Learning 2003) Repeated random resampling Determine the number of clusters by evaluating the consensus distribution for different cluster numbers The next step is the clustering of colony mohrphometric features for different cell lines Here are some challenges First Morphometric features are heterogeneous for the same cell line Second, the number of objects for each cell line is various Third, there is no prior knowledge of the number of clusters We use consensus clustering to address these issues Consensus clustering is a proven method in analyzing gene expression data It addresses the heterogeneity issue through repeated random resampling It determines the number of clusters by evaluating the consensus distribution of different cluster number Which will be discussed in next slide This is the flowchart �� The next step is the clustering of colony mohrphometric features for different cell lines Here are some challenges First Morphometric features are heterogeneous for the same cell line Second, the number of objects for each cell line is various Third, there is no prior knowledge of the number of clusters We use consensus clustering to address these issues Consensus clustering is a proven method in analyzing gene expression data It addresses the heterogeneity issue through repeated random resampling It determines the number of clusters by evaluating the consensus distribution of different cluster number Which will be discussed in next slide This is the flowchart ��

9. Consensus clustering on a panel of 24 breast cancer cell lines in 3D Here are the visualization of the consensus matrices for different number of clusters. Each matrix is a similarity matrix Each row or column represents a cell line The row and column are ordered through hierarical clustering So that similar cell lines are close to each other The values represent the similarities between cell lines They are normalized between 0 and 1 The darker, the more similarity We can see all the diagonal elements are blank since they are identical We can visually examine the clustering quality for different number of clusters That is, the cleaner of the matrix (or the more 0�s and 1�s), the better. We can also used a systematic way to find the optimal number of clusters based on the so-called concentration histogram in the literature. First, the cdfs of the elements in each consensus matrix are computed Then we have the area under each CDF, n=2��6 the percentage changes of cdf areas with respect to the number of clusters is plotted here This is so called concentration histogram For n=2, its cdf is the red curve, and the area under this cdf is just above .2 For n=3, the cdf is the green curve, Since the area of green cdf almost double the red, So the percentage change is ~1 There is very little changes of cdf area from n=3 to n=4 So the percentage change for n=4 is very small. The literature suggests that the peak of the concentration histogram indicate the optimal number of clusters 3 in this case More details could be found in the literature Here are the visualization of the consensus matrices for different number of clusters. Each matrix is a similarity matrix Each row or column represents a cell line The row and column are ordered through hierarical clustering So that similar cell lines are close to each other The values represent the similarities between cell lines They are normalized between 0 and 1 The darker, the more similarity We can see all the diagonal elements are blank since they are identical We can visually examine the clustering quality for different number of clusters That is, the cleaner of the matrix (or the more 0�s and 1�s), the better. We can also used a systematic way to find the optimal number of clusters based on the so-called concentration histogram in the literature. First, the cdfs of the elements in each consensus matrix are computed Then we have the area under each CDF, n=2��6 the percentage changes of cdf areas with respect to the number of clusters is plotted here This is so called concentration histogram For n=2, its cdf is the red curve, and the area under this cdf is just above .2 For n=3, the cdf is the green curve, Since the area of green cdf almost double the red, So the percentage change is ~1 There is very little changes of cdf area from n=3 to n=4 So the percentage change for n=4 is very small. The literature suggests that the peak of the concentration histogram indicate the optimal number of clusters 3 in this case More details could be found in the literature

10. Results with three clusters These images show the three clusters from the consensus clustering: Round, grape-like and stellate This results is consistent with previous manual clustering of the same data In this literature, the round subpopulation is further divided into round and round-mass groups However, the difference can only be observable through fluorescence microscopy, Not from these phase images. One observation is that And 8 out of 9 grape-like cell lines are ERBB2 positive And all stellate cell lines are triple-negativeThese images show the three clusters from the consensus clustering: Round, grape-like and stellate This results is consistent with previous manual clustering of the same data In this literature, the round subpopulation is further divided into round and round-mass groups However, the difference can only be observable through fluorescence microscopy, Not from these phase images. One observation is that And 8 out of 9 grape-like cell lines are ERBB2 positive And all stellate cell lines are triple-negative

11. Molecular predictors of morphometric clusters To find molecular predictors of morphometric clusters We ranked the genes based on bootstrap cross validation error of SVM classifier These are the heatmaps of top selected genes that best predict each morphometric clusters, round, grape-like and stellate, with the same cut-off It is shown that significantly more genes were selected as the predictor of stellate clustersTo find molecular predictors of morphometric clusters We ranked the genes based on bootstrap cross validation error of SVM classifier These are the heatmaps of top selected genes that best predict each morphometric clusters, round, grape-like and stellate, with the same cut-off It is shown that significantly more genes were selected as the predictor of stellate clusters

12. Best genes for predicting the stellate cluster This table shows the best genes for predicting the stellate cluster There are several interesting genes in this list This table shows the best genes for predicting the stellate cluster There are several interesting genes in this list

13. Molecular predictors of morphometric features (colony size) To find the molecular predictors of morphometric features, We rank the genes by nonlinear correlation It is interesting to see that the PPARG also appears as the top gene in this listTo find the molecular predictors of morphometric features, We rank the genes by nonlinear correlation It is interesting to see that the PPARG also appears as the top gene in this list

14. Discussion The gene expression profiles of the stellate colonies are the most distinct from the other two morphometric classes According to these results, we can see that The gene expression profiles of the stellate colonies are the most distinct from the other two morphometric classes PPARgamma appears as the top gene on both lists Currently, we are working on the biological validation for PPARgamma According to these results, we can see that The gene expression profiles of the stellate colonies are the most distinct from the other two morphometric classes PPARgamma appears as the top gene on both lists Currently, we are working on the biological validation for PPARgamma

15. Validation 1: In vitro experiment on PPARG MDAMB231 was assayed in 3D cell cultures maintained in H14 medium with 1% fetal bovine serum The 3D cultures were prepared in triplicate by seeding single cells on top of a thin layer of Matrigel at a density of 2200 cells/cm2 and overlaid by 5% final Matrigel diluted in culture medium GW9662, a PPARG inhibitor, was dissolved in DMSO and added to the 3D cultures in the final concentration of 10 uM at the time of seeding The vehicle control was pure DMSO The culture medium and the drug were changed every other day Five images per well were collected after five full days in 3D culture

16. In vitro validation results

17. Validation 2: In vivo experiment on PPARG These are in vivo experiment results Normal tissues (A on the left) and triple-negative tissues (B on the right) are stained for PPARgamma expression (the red) The amount of PPAARgamma were quantified on a cell by cell base C shows that more cells have PPARgamma expressed in triple-negative tissues than in normal tissues D shows the histogram of PPARgamma/cell on both tissues These are in vivo experiment results Normal tissues (A on the left) and triple-negative tissues (B on the right) are stained for PPARgamma expression (the red) The amount of PPAARgamma were quantified on a cell by cell base C shows that more cells have PPARgamma expressed in triple-negative tissues than in normal tissues D shows the histogram of PPARgamma/cell on both tissues

18. Summary A system for identifying sub-populations for a panel of breast cancer cell lines Here comes to the summary The content in this work is going to be published in Plos Computational biology. Here comes to the summary The content in this work is going to be published in Plos Computational biology.

19. Acknowledgement LBNL Parvin Lab Hang Chang Gerald Fontenay Bahram Parvin Bissell Lab Genee Lee Paraic Kenny Mina Bissell Joe Gray At the end, I would like to thank all the people contributing to this work. Hang is a computer scientist working on image analysis and algorithm development Gerald helped on database management I would like to thank Genee and Paraic for the cell culture and data acquisition. Currently, Genee is with Genetech, and Paraic is with the Albert Einstein College of Medicine Orsi is a Postdoc in Kenny Lab. She worked on the validation for treatment of MDAMB231 with PPARG-inhibitor Frederick is a pathologist at UCSF He did the experiment of PPARG expression on triple-negative breast cancer tissue. Finally, I would like to thank Mina and Joe for their help on this work, and the funding from NIH ICBP program At the end, I would like to thank all the people contributing to this work. Hang is a computer scientist working on image analysis and algorithm development Gerald helped on database management I would like to thank Genee and Paraic for the cell culture and data acquisition. Currently, Genee is with Genetech, and Paraic is with the Albert Einstein College of Medicine Orsi is a Postdoc in Kenny Lab. She worked on the validation for treatment of MDAMB231 with PPARG-inhibitor Frederick is a pathologist at UCSF He did the experiment of PPARG expression on triple-negative breast cancer tissue. Finally, I would like to thank Mina and Joe for their help on this work, and the funding from NIH ICBP program

Ju Han Imaging Informatics Lab Life Sciences Division http:

Ju Han Imaging Informatics Lab Life Sciences Division http:

Presentation Transcript

Life & Medical Sciences Division Status Report

Cancer Imaging Informatics

MEDICAL IMAGING SCIENCES

Applied Sciences Division

Applied Sciences Division

Telehealth/Informatics Imaging

MDCPS’ Division of Social Sciences and Life Skills

MEDICAL IMAGING INFORMATICS

Division of Social Sciences and Life Skills

EARTH SCIENCES DIVISION

Medical Imaging Sciences…

MEDICAL IMAGING INFORMATICS: Lecture # 1 Basics of Medical Imaging Informatics: Estimation Theory

Life & Medical Sciences Division Status Report

Chemical Sciences Division

Organisation Name: Life Sciences Division (LSD)

Imaging Informatics and PACS

Environmental Informatics Division

IMAGING SCIENCES

Division of Social Sciences and Life Skills

Applied Sciences Division

Cancer Imaging Informatics Workshop

Green Vision Life Sciences Pvt Animal Division

Ju Han Imaging Informatics Lab Life Sciences Division http: