Quality control (QC) and filtering

The purpose of this Notebook is to

Input data and configuration

Quality Metrics

Quality control follows the new "Best practice" tutorial for single cell analysis (Luecken & Theis 2019) and the accompanying case study notebook.

Coarse filtering to reduce amount of data:

The sample quality looks sufficiently consistent, so that we apply global filtering cut-offs instead of per-sample filtering

Gene level filtering

Filter genes that occur in less than MIN_CELLS of cells. Relates to the minimal expected cluster size. Given the number of cells, I expect the smallest cluster of interest to contain at least 50 cells.

Ratio of counts to number of mitochondrial genes

Count depth and detected genes

Mitochondrial reads

Apply filtering by quality metrics

Apply MIN_GENES threshold:

Apply MIN_COUNTS threshold:

Apply MAX_MITO threshold:

exclude ribosomal and mitochondrial genes

Ribosomal genes were downloaded from https://www.genenames.org/data/genegroup/#!/group/1054

QC plots after filtering

Immune-receptor-based filtering

We leverate TCR receptor sequencing data to call putative doublets. Cells with more than one TCR-beta or more than two TCR-alpha chains are removed.

Remove multichain cells and extra VDJ (=beta) cells. extra VJ (=alpha) is ok, as T-cells can have two alpha chains.

UMAP plot by covariates

save result