Towards Whole- Transcriptome Deconvolution with Single-cell Data

Towards Whole-TranscriptomeDeconvolutionwith Single-cell Data James Lindsay1 Ion mandoiu1 Craig Nelson2 University Of Connecticut 1Department of Computer Science and Engineering 2Department of Molecular and Cell Biology

Mouse Embryo ANTERIOR / HEAD Neural tube Somites Node Primitive streak POSTERIOR / TAIL

Unknown Mesoderm Progenitor What is the expression profile of the progenitor cell type? NSB=node-streak border; PSM=presomitic mesoderm; S=somite; NT=neural tube/neurectoderm; EN=endoderm

Characterizing Cell-types • Goal: Whole transcriptome expression profiles of individual cell-types • Technically challenging to measure whole transcriptome expression from single-cells • Approach: Computational Deconvolution of cell mixtures • Assisted by single-cell qPCR expression data for a small number of genes

Modeling Cell Mixtures Mixtures (X) are a linear combination of signature matrix (S) and concentration matrix (C) cell types mixtures mixtures cell types genes genes

Previous Work • Coupled Deconvolution • Given: X, Infer: S, C • NMF Repsilber, BMC Bioinformatics, 2010 • Minimum polytope Schwartz, BMC Bioinformatics, 2010 • Estimation of Mixing Proportions • Given: X, S Infer: C • Quadratic Prog Gong, PLoS One, 2012 • LDA Qiao, PLoS Comp Bio, 2o12 • Estimation of Expression Signatures • Given: X, C Infer: S • csSAMShen-Orr, Nature Brief Com, 2010

Single-cell Assisted Deconvolution Given: X and single-cells qPCR data Infer: S, C Approach: • Identify cell-types and estimate reduced signature matrix using single-cells qPCR data • Outlier removal • K-means clustering followed by averaging • Estimate mixing proportions C using • Quadratic programming, 1 mixture at a time • Estimate full expression signature matrix S using C • Quadratic programming , 1 gene at a time

Step 1: Outlier Removal + Clustering Remove cells that have maximum Pearson correlation to other cells below .95 unfiltered filtered

Step 1: PCA of Clustering

Step 2: Estimate Mixture Proportions For a given mixture i: Reduced signature matrix.Centroid of k-means clusters

Step 3: Estimating Full Expression Signatures cell types mixtures mixtures cell types genes genes C: known from step 2 x: observed signals from new gene s: new gene to estimate signatures Now solve:

Experimental Design • Single Cell Profiles • 92 profiles • 31 genes • Simulated Concentrations • Sample uniformly at random [0,1] • Scale column sum to 1. • Simulated Mixtures • Choose single-cells randomly with replacement from each cluster • Sum to generate mixture

Data: RT-qPCR • CT values are the cycle in which gene was detected • Relative Normalization to house-keeping genes • HouseKeeping genes • gapdh, bactin1 • geometric mean • Vandesompele, 2002 • dCT(x) = geometric mean – CT(x) • expression(x) = 2^dCT(x)

Accuracy of Inferred Mixing Proportions

Concentration Matrix: Concordance

Concentration by # Genes: Random

Concentration by # Genes: Ranked

Leave-one-out: Concentration: 50 mix 2^dCT RMSE Missing Gene

Leave-one-out: Signature: 10 mix 2^dCT RMSE Missing Gene

Leave-one-out: Signature: 50 mix 2^dCT RMSE Missing Gene

Future Work • Bootstrapping to report a confidence interval of each estimated concentration and signature • Show correlation between large CI and poor accuracy • Mixing of heterogeneous technologies • qPCR for single-cells, RNA-seq for mixtures • Normalization (need to be linear) • Whole-genome scale • # genes to estimate 10,000+ signatures • Data!

Conclusion Special Thanks to: • Ion Mandoiu • Craig Nelson • Caroline Jakuba • Mathew Gajdosik James.Lindsay@engr.uconn.edu

Towards Whole- Transcriptome Deconvolution with Single-cell Data

Towards Whole- Transcriptome Deconvolution with Single-cell Data

Presentation Transcript

Single Cell Informatics

Single Cell Protein

Single-Cell Organisms

Towards Single Molecule Electronics

Single Cell Variability

Integrative whole transcriptome sequencing in myeloma and MCL therapy

Deconvolution of fibre signals with single electron response (SER)

Gene Expression Deconvolution with Single-cell Data

Transcriptome Sequencing with Reference

Single Image Blind Deconvolution

Deconvolution

Whole cell hybridization

Deconvolution of 3D data cubes

Single Cell Biosensor

Transcriptome

Single Cell Thunderstorms

Transcriptome

Single Cell Thunderstorms

The Single Cell:

Deconvolution

Single Cell Thunderstorms