Interoperability#
SingleCell reads and writes all three major single-cell ecosystems natively, with no intermediate conversion steps:
scverse/Scanpy – AnnData
.h5adfiles and in-memory AnnData objectsSeurat –
.rdsand.h5Seuratfiles, plus in-memory Seurat objects via the ryp Python-R bridgeBioconductor – SingleCellExperiment
.rdsfiles and in-memory SCE objects via ryp
as well as raw 10x Genomics data (.h5 or .mtx/.mtx.gz).
Loading from file#
The constructor auto-detects format from the file extension:
from brisc import SingleCell
# scverse / Scanpy
sc = SingleCell('data.h5ad')
# Seurat (requires ryp)
sc = SingleCell('seurat_obj.rds')
sc = SingleCell('seurat_obj.h5Seurat')
# Bioconductor SingleCellExperiment (requires ryp)
sc = SingleCell('sce_obj.rds')
# 10x Genomics
sc = SingleCell('raw_feature_bc_matrix.h5')
# expects barcodes.tsv.gz and features.tsv.gz in the same directory
sc = SingleCell('matrix.mtx.gz')
Choosing the count matrix#
When a file contains both raw and normalized counts, SingleCell loads raw counts by default. Use X_key to choose a different layer:
# .h5ad: use SingleCell.ls() to find the right key
SingleCell.ls('data.h5ad')
# load from a specific .h5ad slot
sc = SingleCell('data.h5ad', X_key='raw/X')
# Seurat: load normalized counts from the 'data' layer
sc = SingleCell('seurat_obj.rds', X_key='data')
# Seurat: load from a non-default assay
sc = SingleCell('seurat_obj.h5Seurat', assay='SCT')
# SingleCellExperiment: load log-normalized counts
sc = SingleCell('sce_obj.rds', X_key='logcounts')
Partial loading#
For large .h5ad and .h5Seurat files, you can load only the metadata columns you need:
sc = SingleCell('data.h5ad',
obs_columns=['cell_type', 'batch'],
var_columns=['gene_symbol'])
You can skip loading the count matrix entirely (useful for metadata-only exploration or working with dimensionally reduced objects). Note that datasets loaded without X cannot be saved, converted, or used for analyses that require counts:
sc = SingleCell('data.h5ad', X=False)
Reading individual slots#
You can also read obs, var, obsm, varm, or uns from an .h5ad file without loading the full dataset:
# polars DataFrame
obs = SingleCell.read_obs('data.h5ad', columns=['cell_type', 'batch'])
# polars DataFrame
var = SingleCell.read_var('data.h5ad')
# dict: {key: NumPy array | polars DataFrame}
obsm = SingleCell.read_obsm('data.h5ad', keys=['X_pca'])
# dict (nested: scalars, arrays, sub-dicts)
uns = SingleCell.read_uns('data.h5ad')
Saving to file#
Format is inferred from the file extension:
When saving to .rds, the X_key argument controls which layer X is stored in:
# save as normalized counts in the 'data' layer
sc.save('output.rds', X_key='data')
Note
When saving to Seurat .rds, the X_ prefix is automatically stripped from obsm keys (e.g. X_umap becomes umap) to match Seurat’s conventions. Seurat also adds orig.ident, nCount_RNA, and nFeature_RNA by default, which can be slow; to suppress the latter two:
from ryp import r
r('options(Seurat.object.assay.calcn = FALSE)')
In-memory conversion#
From AnnData#
Pass an AnnData object directly to the constructor. By default, raw counts are loaded from adata.layers['UMIs'] or adata.raw.X if present, falling back to adata.X:
import scanpy as sc
adata = sc.read_h5ad('data.h5ad')
sc_data = SingleCell(adata)
# explicitly choose which matrix to use
sc_data = SingleCell(adata, X=adata.layers['raw_counts'])
To AnnData#
adata = sc_data.to_scanpy()
Note
The count matrix is shared, not copied. Modifying adata.X will also modify the original SingleCell dataset. To avoid this, use sc_data.copy().to_scanpy().
There is no from_scanpy() method – SingleCell(adata) serves that purpose.
The ryp Python-R bridge#
Seurat and SingleCellExperiment .rds files are handled transparently via ryp, a Python-R bridge. Loading and saving .rds files works like any other format. For in-memory conversion between SingleCell and R objects, use from_seurat() / to_seurat() and from_sce() / to_sce().
Note
R’s sparse matrices use 32-bit indices, so Seurat and SingleCellExperiment objects cannot hold count matrices with more than 2,147,483,647 (INT32_MAX) non-zero elements. Large datasets may exceed this limit.
Seurat#
from ryp import r
# load a Seurat object in R
r('seurat_obj <- readRDS("seurat_obj.rds")')
# convert to SingleCell
sc = SingleCell.from_seurat('seurat_obj')
# convert from a specific assay/layer
sc = SingleCell.from_seurat('seurat_obj', assay='SCT', layer='data')
# convert back to Seurat (v5) in R's workspace
sc.to_seurat('seurat_out')
# convert to Seurat v3
sc.to_seurat('seurat_out', v3=True)
# or just save directly — no need for to_seurat + saveRDS
sc.save('output.rds')
SingleCellExperiment#
from ryp import r
# load a SingleCellExperiment in R
r('sce <- readRDS("sce_obj.rds")')
# convert to SingleCell
sc = SingleCell.from_sce('sce')
# use log-normalized counts instead
sc = SingleCell.from_sce('sce', assay='logcounts')
# convert back to SCE in R's workspace
sc.to_sce('sce_out')
# or just save directly
sc.save('output_sce.rds', sce=True)
Example: combining Seurat and Scanpy#
A common reason to bridge ecosystems is to use tools that only exist in one. For instance, Azimuth provides automated cell type annotation via a Seurat reference atlas (R only), while scvi-tools provides deep generative models for integration and differential expression (Python only). SingleCell lets you chain these without writing intermediate files:
from brisc import SingleCell
from ryp import r
# start in Python: load and QC
sc = SingleCell('data.h5ad').skip_qc()
# pass to R: annotate cell types with Azimuth
sc.to_seurat('obj')
r('''
library(Azimuth)
obj <- RunAzimuth(obj, reference = "pbmcref")
''')
sc = SingleCell.from_seurat('obj')
# sc.obs now contains Azimuth's predicted.celltype.l1, l2, etc.
# back in Python: run scvi integration
adata = sc.to_scanpy()
import scvi
scvi.model.SCVI.setup_anndata(adata, batch_key='batch')
model = scvi.model.SCVI(adata)
model.train()
adata.obsm['X_scVI'] = model.get_latent_representation()
# return to SingleCell for downstream analysis
sc = SingleCell(adata)
Constructing from scratch#
You can also build a SingleCell dataset from individual components:
import polars as pl
from scipy.sparse import csr_array
X = csr_array([[1, 0, 3], [0, 2, 0]])
obs = pl.DataFrame({'cell_id': ['cell1', 'cell2']})
var = pl.DataFrame({'gene_id': ['g1', 'g2', 'g3']})
sc = SingleCell(X=X, obs=obs, var=var)
Summary#
Operation |
Method |
|---|---|
Load |
|
Load |
|
Load 10x |
|
Load specific layer |
|
Load subset of columns |
|
Load without counts |
|
Read |
|
Save to any format |
|
From AnnData |
|
To AnnData |
|
From Seurat (in-memory) |
|
To Seurat (in-memory) |
|
From SCE (in-memory) |
|
To SCE (in-memory) |
|
Construct manually |