Benchmarking SignacX and SingleR with synovial flow cytometry data

In Figure 3 of the pre-print, we validated Signac with flow cytometry and compared Signac to SingleR. We reproduced that analysis using Seurat in this vignette, and provide interactive access to the data here. We start with raw counts.

Load data

Read the CEL-seq2 data.

ReadCelseq <- function(counts.file, meta.file) {
    E = suppressWarnings(readr::read_tsv(counts.file))
    gns <- E$gene
    E = E[, -1]
    E = Matrix::Matrix(as.matrix(E), sparse = TRUE)
    rownames(E) <- gns
    E
}

counts.file = "./fls/celseq_matrix_ru10_molecules.tsv.gz"
meta.file = "./fls/celseq_meta.immport.723957.tsv"

E = ReadCelseq(counts.file = counts.file, meta.file = meta.file)
M = suppressWarnings(readr::read_tsv(meta.file))

# filter data based on depth and number of genes detected
kmu = Matrix::colSums(E != 0)
kmu2 = Matrix::colSums(E)
E = E[, kmu > 200 & kmu2 > 500]

# filter by mitochondrial percentage
logik = grepl("^MT-", rownames(E))
MitoFrac = Matrix::colSums(E[logik, ])/Matrix::colSums(E) * 100
E = E[, MitoFrac < 20]

SingleR

require(SingleR)
data("hpca")
Q = SingleR(sc_data = E, ref_data = hpca$data, types = hpca$main_types, fine.tune = F, numCores = 4)
save(file = "fls/SingleR_Results.rda", Q)
True_labels = M$type[match(colnames(E), M$cell_name)]
saveRDS(True_labels, file = "fls/celltypes_amp_synovium_true.rds")

Seurat

Start with the standard pre-processing steps for a Seurat object.

library(Seurat)

Create a Seurat object, and then perform SCTransform normalization. Note:

  • You can use the legacy functions here (i.e., NormalizeData, ScaleData, etc.), use SCTransform or any other normalization method (including no normalization). We did not notice a significant difference in cell type annotations with different normalization methods.
  • We think that it is best practice to use SCTransform, but it is not a necessary step. Signac will work fine without it.
# load data
synovium <- CreateSeuratObject(counts = E, project = "FACs")

# run sctransform
synovium <- SCTransform(synovium, verbose = F)

Perform dimensionality reduction by PCA and UMAP embedding. Note:

  • Signac actually needs these functions since it uses the nearest neighbor graph generated by Seurat.
# These are now standard steps in the Seurat workflow for visualization and clustering
synovium <- RunPCA(synovium, verbose = FALSE)
synovium <- RunUMAP(synovium, dims = 1:30, verbose = FALSE)
synovium <- FindNeighbors(synovium, dims = 1:30, verbose = FALSE)

SignacX

library(SignacX)

Generate Signac labels for the Seurat object. Note:

  • Optionally, you can do parallel computing by setting num.cores > 1 in the Signac function.
  • Run time is ~10 minutes for <100,000 cells.
# Run Signac
labels <- Signac(synovium, num.cores = 4)
celltypes = GenerateLabels(labels, E = synovium)

Compare SignacX and SingleR with FACs labels

SignacX (rows are FACs labels, columns are SignacX)

B F M NonImmune T Unclassified
B 945 0 2 0 0 19
F 0 2218 10 223 0 58
M 1 28 891 18 0 96
T 4 0 0 0 1768 21

SingleR (rows are FACs labels, columns are SingleR)

B Chondr. F M NK NonImmune T
B 958 1 0 6 1 0 0
F 2 1468 36 19 23 960 1
M 4 39 0 964 6 21 0
T 9 0 0 2 368 0 1414

Note:

  • These data were flow-sorted for CD3, and therefore lack NK cells. Signac detected no NK cells.
  • F = Fibroblasts; M = monocytes; T = T cells; B = B cells
  • Note that SingleR inaccurately annotated the majority of fibroblast cells as chondrocytes (Chondr.) and non immune cells, and also significantly identified NK cells (even though the data were flow-sorted for CD3, and thus lacked NK cells).

Signac accuracy

logik = xy != "Unclassified"
Signac_Accuracy = round(sum(xy[logik] == True_labels[logik])/sum(logik) * 100, 2)
Signac_Accuracy
## [1] 95.32

SingleR accuracy

SingleR_Accuracy = round(sum(xx == True_labels)/sum(logik) * 100, 2)
SingleR_Accuracy
## [1] 55.21

Save results

saveRDS(synovium, file = "synovium_signac.rds")
saveRDS(celltypes, file = "synovium_signac_celltypes.rds")
Session Info
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.49        
##  [5] maketools_1.3.1   cachem_1.1.0      knitr_1.48        htmltools_0.5.8.1
##  [9] rmarkdown_2.28    buildtools_1.0.0  lifecycle_1.0.4   cli_3.6.3        
## [13] sass_0.4.9        jquerylib_0.1.4   compiler_4.4.1    highr_0.11       
## [17] sys_3.4.3         tools_4.4.1       evaluate_1.0.1    bslib_0.8.0      
## [21] yaml_2.3.10       formatR_1.14      jsonlite_1.8.9    rlang_1.1.4