# get default parameters. Either papermill or rmarkdown way.
try:
input_file = r.params["input_file"]
output_file = r.params["output_file"]
except:
print("Could not access params from `r` object. Don't worry if your are running papermill. ")
input_file = "results/03_correct_data/adata.h5ad"
output_file = "results/04_annotate_cell-types/adata.h5ad"
We use the Leiden algorithm (Traag et al.) to determine cell-type clusters.
The algorithm depends on a resolution parameter. The higher the resolution, the more clusters will be found. We perform a grid search to test all parameters in a certain range, and hope to find the number of clusters to be stable across a range of resolutions, indicating biologically meaningful clustering.
## running Leiden clustering
## finished
There does not seem to be a clear plateau in the curve except the (arguably small) around ~1.0. The clustering with r=1.0 looks reasonable for assigning cell-types therefore we stick with that for this task.
fig, ax = plt.subplots(figsize=(14, 10))
sc.pl.umap(adata, color="leiden", ax=ax, legend_loc="on data")
Perform final clustering with resolution=1:
for ct in cell_types:
marker_genes = markers.loc[markers["cell_type"] == ct,"gene_identifier"]
sc.pl.umap(adata, color=marker_genes, title=["{}: {}".format(ct, g) for g in marker_genes])