1 / 22

Towards Whole- Transcriptome Deconvolution with Single-cell Data

Towards Whole- Transcriptome Deconvolution with Single-cell Data. James Lindsay 1 Ion mandoiu 1 Craig Nelson 2. University Of Connecticut 1 Department of Computer Science and Engineering 2 Department of Molecular and Cell Biology. Mouse Embryo. ANTERIOR / HEAD. Neural tube. Somites.

duard
Télécharger la présentation

Towards Whole- Transcriptome Deconvolution with Single-cell Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Whole-TranscriptomeDeconvolutionwith Single-cell Data James Lindsay1 Ion mandoiu1 Craig Nelson2 University Of Connecticut 1Department of Computer Science and Engineering 2Department of Molecular and Cell Biology

  2. Mouse Embryo ANTERIOR / HEAD Neural tube Somites Node Primitive streak POSTERIOR / TAIL

  3. Unknown Mesoderm Progenitor What is the expression profile of the progenitor cell type? NSB=node-streak border; PSM=presomitic mesoderm; S=somite; NT=neural tube/neurectoderm; EN=endoderm

  4. Characterizing Cell-types • Goal: Whole transcriptome expression profiles of individual cell-types • Technically challenging to measure whole transcriptome expression from single-cells • Approach: Computational Deconvolution of cell mixtures • Assisted by single-cell qPCR expression data for a small number of genes

  5. Modeling Cell Mixtures Mixtures (X) are a linear combination of signature matrix (S) and concentration matrix (C) cell types mixtures mixtures cell types genes genes

  6. Previous Work • Coupled Deconvolution • Given: X, Infer: S, C • NMF Repsilber, BMC Bioinformatics, 2010 • Minimum polytope Schwartz, BMC Bioinformatics, 2010 • Estimation of Mixing Proportions • Given: X, S Infer: C • Quadratic Prog Gong, PLoS One, 2012 • LDA Qiao, PLoS Comp Bio, 2o12 • Estimation of Expression Signatures • Given: X, C Infer: S • csSAMShen-Orr, Nature Brief Com, 2010

  7. Single-cell Assisted Deconvolution Given: X and single-cells qPCR data Infer: S, C Approach: • Identify cell-types and estimate reduced signature matrix using single-cells qPCR data • Outlier removal • K-means clustering followed by averaging • Estimate mixing proportions C using • Quadratic programming, 1 mixture at a time • Estimate full expression signature matrix S using C • Quadratic programming , 1 gene at a time

  8. Step 1: Outlier Removal + Clustering Remove cells that have maximum Pearson correlation to other cells below .95 unfiltered filtered

  9. Step 1: PCA of Clustering

  10. Step 2: Estimate Mixture Proportions For a given mixture i: Reduced signature matrix.Centroid of k-means clusters

  11. Step 3: Estimating Full Expression Signatures cell types mixtures mixtures cell types genes genes C: known from step 2 x: observed signals from new gene s: new gene to estimate signatures Now solve:

  12. Experimental Design • Single Cell Profiles • 92 profiles • 31 genes • Simulated Concentrations • Sample uniformly at random [0,1] • Scale column sum to 1. • Simulated Mixtures • Choose single-cells randomly with replacement from each cluster • Sum to generate mixture

  13. Data: RT-qPCR • CT values are the cycle in which gene was detected • Relative Normalization to house-keeping genes • HouseKeeping genes • gapdh, bactin1 • geometric mean • Vandesompele, 2002 • dCT(x) = geometric mean – CT(x) • expression(x) = 2^dCT(x)

  14. Accuracy of Inferred Mixing Proportions

  15. Concentration Matrix: Concordance

  16. Concentration by # Genes: Random

  17. Concentration by # Genes: Ranked

  18. Leave-one-out: Concentration: 50 mix 2^dCT RMSE Missing Gene

  19. Leave-one-out: Signature: 10 mix 2^dCT RMSE Missing Gene

  20. Leave-one-out: Signature: 50 mix 2^dCT RMSE Missing Gene

  21. Future Work • Bootstrapping to report a confidence interval of each estimated concentration and signature • Show correlation between large CI and poor accuracy • Mixing of heterogeneous technologies • qPCR for single-cells, RNA-seq for mixtures • Normalization (need to be linear) • Whole-genome scale • # genes to estimate 10,000+ signatures • Data!

  22. Conclusion Special Thanks to: • Ion Mandoiu • Craig Nelson • Caroline Jakuba • Mathew Gajdosik James.Lindsay@engr.uconn.edu

More Related