In most single-cell transcriptomics (scRNA-seq) studies, the data are extremely sparse and noisy, hindering downstream analyses. To address this problem, we have developed Single-cell Analysis Via Expression Recovery via harnessing eXternal data (SAVER-X), a method for gene expression denoising and imputation. We leverage publicly available datasets and employ transfer learning in depp learning for high-quality scRNA-seq data denoising across a variety of settings. Here, you can see the network architecture for the auto-encoder used in our model which also enables transfer learning across the human and mouse species through shared genes. After implementing the auto-encoder, we perform a gene filtering step followed by Bayesian shrinkage.
We provide demo datasets that users can avail of in order to test our gateway:
If you haven't already created an account, you should be seeing a page like the screenshot on the right.
Simply follow the link and register for an account. You will be receiving notifications about the completion of your job, as well as the link to access the denoised data at the email address you use to create your account.
Once you successfully login, you can find the SAVER-X application in your dashboard. One SAVER-X run launches a new experiment and you can organize experiments into projects.
The interface to launch a new experiment should look like the screenshot on the right. Researchers can directly upload their data of UMI counts to the web portal as a (gene x cell) matrix stored in a .csv, .txt or .rds file. Our cloud service does not store the user’s data or use it for any other purpose besides denoising.
Currently, SAVER-X has trained deep count auto-encoder models across a range of tissue types (retina, brain, blood, etc.), cell types, as well as species. SAVER-X is equipped to deal with UMI counts, and the input matrix MUST BE a matrix of UMI counts. Users can choose a specific model involving any combination of these three variables against which they would like to denoise their data. Many of these choices will be context-dependent though they can also certainly be exploratory. For instance, if a researcher has generated single-cell transcriptomics data from the mouse hippocampus, selecting "Mouse" and "Brain" would be the logical first choices. On the other hand, if you have generated data from the human retina but are interested to see if denoising it using a model trained on mouse retina can lead to new discoveries, then SAVER-X allows the users to explore such transfer learning questions. If you are unsure of how to select the models, "No Pretraining" is a default model choice.
Our SAVER-X gateway allows parallelization to speed up computation. Click "Setting for queue RM-shared", you can customize the number of requested cores and wall time limit.