In most single cell transcriptomics (scRNA-seq) studies, the data are extremely sparse and noisy, hindering downstream analyses. To address this problem, we have developed Single-cell Analysis Via Expression Recovery via harnessing external data (SAVER-X), a method for gene expression denoising and imputation.
Although single-cell RNA sequencing technologies have shed light on the role of cellular diversity in human pathophysiology, the resulting data remains noisy and sparse, making reliable quantification of gene expression challenging. In SAVER-X, we use a deep autoencoder coupled to a Bayesian model to improve the quality of UMI-based scRNA-seq data by transfer learning across datasets. SAVER-X, outperforms existing state-of-the-art tools as our deep learning model extracts transferable gene expression features across data from different labs, generated by varying technologies, and obtained from divergent species. With increasing accumulation of publicly available data, SAVER-X will increase in generalization accuracy and in tissue- and cell-type specificity. A technology like SAVER-X changes the approach to scRNA-seq data analysis from a process of study-specific quality control and statistical modeling to an automated process of cross-study data integration and information sharing.