5 Validation with real data
We use datasets which estimate the immune cell proportions using flow cytometry as an additional validation for our simulation benchmark. We use the following three validation datasets (see section 2):
We use the following cell types, which are available in (some of) the datasets
use_cell_types = c("T cell", "T cell CD8+", "T cell CD4+",
"Monocyte", "B cell",
"Dendritic cell", "NK cell")
Here, we combine the predictions with the ‘gold standard’ reference data.
all_results_ref = inner_join(all_results, all_refs,
by = c("sample" = "sample",
"dataset" = "dataset",
"cell_type" = "cell_type"))
res_validation$all_results = all_results_ref
The scores of the methods have different properties and not all of them are directly comparable. We distinguish between three types of comparisons:
- absolute scores: allow to compare within and between samples.
- scores relative to the total immune cell content: allow to compare within a sample
- scores in arbitrary units, that only allow to compare between samples.
5.1 Between- and within-sample comparisons
Only works for methods providing an absolute score (cibersort_abs, quantiseq, epic). All other methods are included for reference.
5.1.1 Absolute scores
Both EPIC and quanTIseq provide scores that can be interpreted as absolute cell fractions. Both methods account for the different mRNA contents of the different cell types by applying a scaling factor. As the expression values of the single cells we use for the simulation have already been normalized, these scaling factors are not appropriate. While this has no effect on the correlations discussed in the previous section, it is crucial for deriving correct absolute values. For this reason, assessing the absolute quantification performance is fair on the validation datasets only, which provide geniune bulk RNA-seq data.
We use two measures to assess the absolute deviation: (1) the slope of a linear model fitted to the estimated versus known fractions and (2) the root mean square error (RMSE). The slope quickly allows to evaluate whether a method over- or under-predicts a certain cell type while RMSE gives insight about the true deviations.
5.2 Within-sample comparison
only works with methods that provide an absolute score, or a score that is relative to total immune cell content (cibersort, cibersort_abs, quantiseq, epic).
5.3 Between-sample comparisons
For this, we need to look at every cell type independently. Works for all methods except CIBERSORT.
Compute the average over all cell types. We only use cell types with at least 5 samples to obtain reasonable correlation estimates. We also exclude the T-cell supercategory to avoid redundancies.
Note that, although the correlation for CD8+ T cells looks bad, it does not necessarily mean the predictions are bad. There is just little variance between the samples and CD8+ T cell abundance is generally low, which is correlctly predicted by the methods.