Cell type classification with SignacX: 1k PBMCs from 10X Genomics

This vignette shows how to use Signac with Seurat. There are three parts: Seurat, Signac and then visualization. We use an example PBMCs scRNA-seq data set from 10X Genomics.

Seurat

Start with the standard pre-processing steps for a Seurat object.

library(Seurat)

Download data from 10X Genomics.

dir.create("fls")
download.file("https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_filtered_feature_bc_matrix.h5",
    destfile = "fls/pbmc_1k_v3_filtered_feature_bc_matrix.h5")

Create a Seurat object, and then perform SCTransform normalization. Note:

  • You can use the legacy functions here (i.e., NormalizeData, ScaleData, etc.), use SCTransform or any other normalization method (including no normalization). We did not notice a significant difference in cell type annotations with different normalization methods.
  • We think that it is best practice to use SCTransform, but it is not a necessary step. Signac will work fine without it.
# load data
E = Read10X_h5(filename = "fls/pbmc_1k_v3_filtered_feature_bc_matrix.h5")
pbmc <- CreateSeuratObject(counts = E, project = "pbmc")

# run sctransform
pbmc <- SCTransform(pbmc, verbose = FALSE)

Perform dimensionality reduction by PCA and UMAP embedding. Note:

  • Signac actually needs these functions since it uses the nearest neighbor graph generated by Seurat.
# These are now standard steps in the Seurat workflow for visualization and clustering
pbmc <- RunPCA(pbmc, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = FALSE)

SignacX

First, make sure you have the Signac package installed.

install.packages("SignacX")

Load the library

# load library
library(SignacX)

Generate SignacX labels for the Seurat object. Note:

  • Optionally, you can do parallel computing by setting num.cores > 1 in the Signac function.
  • Run time is ~10 minutes for ~10,000 cells.
# Run Signac
labels <- Signac(pbmc, num.cores = 4)
celltypes = GenerateLabels(labels, E = pbmc)

Sometimes, training the neural networks takes a lot of time. To make Signac faster, we implemented SignacFast which uses an ensemble of pre-trained neural network models. Note:

  • SignacFast uses an ensemble of 1,800 pre-calculated neural networks rather than performing bespoke model training.
# Run Signac
labels_fast <- SignacFast(pbmc)
celltypes_fast = GenerateLabels(labels_fast, E = pbmc)

SignacFast took only ~30 seconds. Relative to Signac, the main difference is that SignacFast tends to leave a few more cells “Unclassified.”

How does SignacFast compare to Signac?
B MPh TNK Unclassified
B 186 0 0 0
MPh 0 362 0 54
TNK 0 0 573 3
Unclassified 0 0 0 44

Visualizations

Now we can visualize the cell type classifications at many different levels: Immune and nonimmune

pbmc <- AddMetaData(pbmc, metadata = celltypes$Immune, col.name = "immmune")
pbmc <- SetIdent(pbmc, value = "immmune")
png(filename = "fls/plot1.png")
DimPlot(pbmc)
dev.off()
Immune, Nonimmune (if any) and unclassified cells
Immune, Nonimmune (if any) and unclassified cells
pbmc <- AddMetaData(pbmc, metadata = celltypes$L2, col.name = "L2")
pbmc <- SetIdent(pbmc, value = "L2")
png(filename = "fls/plot2.png")
DimPlot(pbmc)
dev.off()
Myeloid and lymphocytes
Myeloid and lymphocytes
lbls = factor(celltypes$CellTypes)
levels(lbls) <- sort(unique(lbls))
pbmc <- AddMetaData(pbmc, metadata = lbls, col.name = "celltypes")
pbmc <- SetIdent(pbmc, value = "celltypes")
png(filename = "./fls/plot3.png")
DimPlot(pbmc)
dev.off()
Cell types
Cell types
pbmc <- AddMetaData(pbmc, metadata = celltypes$CellTypes_novel, col.name = "celltypes_novel")
pbmc <- SetIdent(pbmc, value = "celltypes_novel")
png(filename = "./fls/plot4.png")
DimPlot(pbmc)
dev.off()
Cell types with novel populations
Cell types with novel populations
pbmc <- AddMetaData(pbmc, metadata = celltypes$CellStates, col.name = "cellstates")
pbmc <- SetIdent(pbmc, value = "cellstates")
png(filename = "./fls/plot5.png")
DimPlot(pbmc)
dev.off()
Cell states
Cell states

Identify differentially expressed genes between cell types. Here, we see that Signac identified two novel cell populations that are positive for platelet and plasma cell markers, respectively.

pbmc <- SetIdent(pbmc, value = "celltypes_novel")

# Find protein markers for all clusters, and draw a heatmap
markers <- FindAllMarkers(pbmc, only.pos = TRUE, verbose = F, logfc.threshold = 1)
library(dplyr)
top5 <- markers %>%
    group_by(cluster) %>%
    top_n(n = 5, wt = avg_logFC)
png(filename = "./fls/plot6.png")
DoHeatmap(pbmc, features = unique(top5$gene), angle = 90)
dev.off()
Immune marker genes
Immune marker genes

Save results

saveRDS(pbmc, file = "fls/pbmcs_signac.rds")
saveRDS(celltypes, file = "fls/celltypes.rds")
saveRDS(celltypes_fast, file = "fls/celltypes_fast.rds")
Session Info
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.51        
##  [5] maketools_1.3.2   cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1
##  [9] rmarkdown_2.29    buildtools_1.0.0  lifecycle_1.0.4   cli_3.6.4        
## [13] sass_0.4.9        jquerylib_0.1.4   compiler_4.4.3    sys_3.4.3        
## [17] tools_4.4.3       evaluate_1.0.3    bslib_0.9.0       yaml_2.3.10      
## [21] formatR_1.14      jsonlite_2.0.0    rlang_1.1.5