Methods like fluorescence activated cell sorting (FACS) or Immunohistochemistry (IHC)-staining have been used as a gold standard to estimate the immune cell content within a sample, however these methods are limited in their scalability and by the availability of good antibodies against the cell type markers. High throughput transcriptomic methods allow to get a transcriptional landscape in the sample with a relatively small amount of material that can be extremely limited in clinical settings (e.g. tumor biopsies), which led to high utility of methods like RNA-seq and microarrays to characterize patient tumor samples. However, RNA-seq does not provide a detailed information on a cellular composition of a sample, which then has to be inferred using computational techniques.
Such methods can, in general, be classified in two categories:
Marker gene based approaches (a) are based on a list of genes (signature), that are characteristic for a cell type. By looking at the expression values of signature genes, every cell type is quantified independently, either using the gene expression values directly (MCP-counter) or by performing a statistical test for enrichment of the signatures (xCell).
Deconvolution methods (b) formulate the problem as a system of equations that describe the gene expression of a sample as the weighted sum of the contributions of the different cell types. By solving the inverse problem, cell type fractions can be inferred given a signature matrix and the mixed gene expression. This can be accomplished using \(\nu\)-Support Vector Regression (SVR) (CIBERSORT) constrained least square regression (quanTIseq, EPIC) or linear least square regression (TIMER).
For more information, check out the review by Finotello and Trajanoski (2018).
The input data is a
sample gene expression matrix. In general values should be
For xCell and MCP-counter this is not so important. xCell works on the ranks of the gene expression only and MCP-counter sums up the gene expression values.
Rownames are expected to be HGNC gene symbols. Instead of a matrix, immunedeconv also supports ExpressionSets (see below).
This package gives you easy access to these methods. To run a method with default options, simply invoke
gene_expression_matrix is a matrix with genes in rows and samples in columns. The rownames must be HGNC symbols and the colnames must be sample names. The method can be one of
quantiseq timer cibersort cibersort_abs mcp_counter xcell epic
For this example, we use a dataset of four melanoma patients from Racle et al. (2017).
res = deconvolute(immunedeconv::dataset_racle$expr_mat, "quantiseq") knitr::kable(res, digits=2)
|T cell CD4+ (non-regulatory)||0.01||0.44||0.00||0.38|
|T cell CD8+||0.00||0.03||0.09||0.05|
|T cell regulatory (Tregs)||0.02||0.10||0.06||0.06|
|Myeloid dendritic cell||0.00||0.00||0.00||0.00|
CIBERSORT is only freely available for academic users and could not be directly included in this package. To use CIBERSORT with this package, you need to register on the cibersort website, obtain a license, and download the CIBERSORT source code.
The source code package contains two files, that are required:
Note the storage location of these files. When using
immunedeconv, you need to tell the package where it can find those files:
library(immunedeconv) set_cibersort_binary("/path/to/CIBERSORT.R") set_cibersort_mat("/path/to/LM22.txt")
Afterwards, you can call
deconvolute(your_mixture_matrix, "cibersort") # or 'cibersort_abs'
as for any other method.
TIMER uses indication-specific reference profiles. Therefore, you must specify the tumor type when running TIMER:
indications needs to be a vector that specifies an indication for each sample (=column) in the mixture matrix. The indications supported by TIMER are
##  "kich" "blca" "brca" "cesc" "gbm" "hnsc" "kirp" "lgg" "lihc" "luad" ##  "lusc" "prad" "sarc" "pcpg" "paad" "tgct" "ucec" "ov" "skcm" "dlbc" ##  "kirc" "acc" "meso" "thca" "uvm" "ucs" "thym" "esca" "stad" "read" ##  "coad" "chol"
What the abbreviations stand for is documented on the TCGA wiki.
immunedeconv supports the use of an ExpressionSet instead of a gene expression matrix. In that case,
pData requires a column that contains gene symbols. Which one needs to be specified in the
deconvolute(my_expression_set, "quantiseq", column = "<column name>")
To provide consistently named results independent of the method, we defined a controlled vocabulary (CV) of cell-types and arranged them in a tree.
For each method, each cell-type is mapped to a node in the tree. If you are curious, it’s all defined in this excel sheet.
This tree can be used to summarize scores along the tree. For instance, quanTIseq provides scores for regulatory and non-regulatory CD4+ T cells independently, but you are interested in the fraction of overall CD4+ T cells. In that case you can use
map_result_to_celltypes to sum up the scores:
res = deconvolute(immunedeconv::dataset_racle$expr_mat, "quantiseq") %>% map_result_to_celltypes(c("T cell CD4+"), "quantiseq")
## Warning: `funs()` is deprecated as of dplyr 0.8.0. ## Please use a list of either functions or lambdas: ## ## # Simple named list: ## list(mean = mean, median = median) ## ## # Auto named with `tibble::lst()`: ## tibble::lst(mean, median) ## ## # Using lambdas ## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE)) ## This warning is displayed once every 8 hours. ## Call `lifecycle::last_warnings()` to see where this warning was generated.
|T cell CD4+||0.03||0.54||0.06||0.44|
The algorithm is explained in detail in the methods section of Sturm et al. (2019).
In general, cell-type scores allow for the comparison (1) between samples, (2) between cell-types or (3) both. Between-sample comparisons allow to make statements such as “In patient A, there are more CD8+ T cells than in patient B”. Between-cell-type comparisons allow to make statements such as “In a certain patient, there are more B cells than T cells”. For more information, see our Benchmark paper (Sturm et al. (2019)).
EPIC and quanTIseq are currently the only methods providing an absolute score, i.e. a score that can be interpreted as a cell fraction. These methods also provide an estimate for the amount of uncharacterized cells, i.e. cells for that no signature exists. This measure often corresponds to the fraction of cancer cells in the sample.
CIBERSORT abs., while allowing both between- and within-sample comparisons, generates a score in arbitrary units.
No, currently not. The reason is that the methods are conceptually different. Some are marker gene based and others deconvolution-based. CIBERSORT performs feature-selection on the matrix while EPIC and quanTIseq don’t. EPIC uses all genes to estimate the inter-sample variance while quanTIseq uses marker genes only. This is also being discussed in #15.
You can, however, provide custom signatures for most individual methods (see next question).
You can access each method individually through the
deconvolute_xxx function. Through these functions you can access all native features. See the function reference for details.
If you believe that the feature is available across multiple methods and should be added to the
deconvolute interface, feel free to open an issue or pull request.
Finotello, Francesca, and Zlatko Trajanoski. 2018. “Quantifying tumor-infiltrating immune cells from transcriptomics data.” Cancer Immunology, Immunotherapy 0. https://doi.org/10.1007/s00262-018-2150-z.
Racle, Julien, Kaat de Jonge, Petra Baumgaertner, Daniel E Speiser, and David Gfeller. 2017. “Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data.” eLife 6 (November): e26476. https://doi.org/10.7554/eLife.26476.
Sturm, Gregor, Francesca Finotello, Florent Petitprez, Jitao David Zhang, Jan Baumbach, Wolf H Fridman, Markus List, and Tatsiana Aneichyk. 2019. “Comprehensive Evaluation of Transcriptome-Based Cell-Type Quantification Methods for Immuno-Oncology.” Bioinformatics 35 (14): i436–i445.