Pseudobulk#
- class brisc.Pseudobulk(source=None, /, *, X=None, obs=None, var=None, num_threads=None)[source]#
A pseudobulked single-cell dataset resulting from calling pseudobulk() on a SingleCell dataset.
Has slots for:
X: a dict of NumPy arrays of counts per cell and gene for each cell type
obs: a dict of polars DataFrames of sample metadata for each cell type
var: a dict of polars DataFrames of gene metadata for each cell type
- num_threads: the default number of threads to use for operations on the
dataset that support multithreading (which can be overridden by individual functions)
as well as obs_names and var_names, aliases for a dict of obs[:, 0] and var[:, 0] for each cell type.
In many ways, Pseudobulk objects behave like dictionaries:
pb1 | pb2 combines pseudobulks with non-overlapping cell types into one big pseudobulk
cell_type in pb tests whether cell_type is a cell type in the pseudobulk
for cell_type in pb: and for cell_type in pb.keys(): yield the cell type names
for X, obs, var in pb.values(): yields each cell type’s X, obs, and var
for cell_type, (X, obs, var) in pseudobulk.items(): yields both the name and the X, obs and var for each cell type
There are also custom iterators if you just want one field per cell type:
for X in pseudobulk.iter_X(): yields just the X for each cell type
for obs in pseudobulk.iter_obs(): yields just the obs
for var in pseudobulk.iter_var(): yields just the var
- Parameters:
source : str | Path | None
X : dict[str, np.ndarray[np.dtype[np.integer | np.floating]]] | None
obs : dict[str, pl.DataFrame] | None
var : dict[str, pl.DataFrame] | None
num_threads : int | np.integer | None
I/O#
Load a saved Pseudobulk dataset, or create one from an in-memory count matrix + metadata for each cell type. |
|
Saves a Pseudobulk dataset to directory (which must not exist unless overwrite=True, and will be created) with three files per cell type: the X at f'{cell_type}.X.npy', the obs at f'{cell_type}.obs.parquet', and the var at f'{cell_type}.var.parquet'. |
Properties#
A dictionary of count matrices for each cell type, as NumPy arrays. |
|
A dictionary of Polars DataFrames of sample-level metadata for each cell type. |
|
A dictionary of Polars DataFrames of gene-level metadata for each cell type. |
|
A shortcut to access the first column of obs for each cell type. |
|
A shortcut to access the first column of var for each cell type. |
|
The default number of threads used for this Pseudobulk dataset's operations. |
|
a dictionary mapping each cell type to a length-2 tuple where the first element is the number of samples, and the second is the number of genes. |
Data access#
Get the row of X[cell_type] corresponding to a single sample, based on the sample's name in obs_names. |
|
Get the column of X[cell_type] corresponding to a single gene, based on the gene's name in var_names. |
Dictionary interface#
Get a KeysView (like you would get from dict.keys()) of this Pseudobulk dataset's cell types. |
|
Get a ValuesView (like you would get from dict.values()) of (X, obs, var) tuples for each cell type in this Pseudobulk dataset. |
|
Get an ItemsView (like you would get from dict.items()) of (cell_type, (X, obs, var)) tuples for each cell type in this Pseudobulk dataset. |
|
Iterate over each cell type's X. |
|
Iterate over each cell type's obs. |
|
Iterate over each cell type's var. |
|
Check if this Pseudobulk dataset contains the specified cell type. |
|
Combine the cell types of this Pseudobulk dataset with another. |
|
Test for equality with another Pseudobulk dataset. |
Manipulation#
Sets a column as the new first column of obs, i.e. the obs_names. |
|
Sets a column as the new first column of var, i.e. the var_names. |
|
Return a new Pseudobulk dataset with a different default number of threads. |
|
Equivalent to df.filter() from polars, but applied to both obs and X for each cell type. |
|
Equivalent to df.filter() from polars, but applied to both var and X for each cell type. |
|
Equivalent to df.select() from polars, but applied to each cell type's obs. |
|
Equivalent to df.select() from polars, but applied to each cell type's var. |
|
Create a new Pseudobulk dataset subset to the cell type(s) in cell_types and more_cell_types. |
|
Equivalent to df.with_columns() from polars, but applied to each cell type's obs. |
|
Equivalent to df.with_columns() from polars, but applied to each cell type's var. |
|
Create a new Pseudobulk dataset with columns and more_columns removed from obs. |
|
Create a new Pseudobulk dataset with columns and more_columns removed from var. |
|
Create a new Pseudobulk dataset with cell_types and more_cell_types removed. |
|
Create a new Pseudobulk dataset with column(s) of obs renamed for each cell type. |
|
Create a new Pseudobulk dataset with column(s) of var renamed for each cell type. |
|
Create a new Pseudobulk dataset with cell type(s) renamed. |
|
Cast each cell type's X to the specified data type. |
|
Cast column(s) of each cell type's obs to the specified data type(s). |
|
Cast column(s) of each cell type's var to the specified data type(s). |
|
Left-join each cell type's obs with another DataFrame, using the same logic as polars.DataFrame.join(). |
|
Left-join each cell type's obs with another DataFrame, using the same logic as polars.DataFrame.join(). |
|
Subsample a specific number or fraction of samples. |
|
Subsample a specific number or fraction of genes. |
|
Split this Pseudobulk dataset into a tuple of Pseudobulk datasets with one cell type each. |
|
Concatenate one or more other Pseudobulk datasets with this one, sample-wise. |
|
Concatenate one or more other Pseudobulk datasets with this one, gene-wise. |
Transformation#
Make a copy of this Pseudobulk dataset. |
|
Convert this Pseudobulk object to a polars DataFrame, with one row per (sample, cell type) pair and one column per gene. |
|
Apply a function to each cell type's X. |
|
Apply a function to each cell type's obs. |
|
Apply a function to each cell type's var. |
Analysis#
Subsets each cell type to samples passing quality control (QC). |
|
Calculate normalization factor-adjusted library sizes for each sample in each cell type, via the approach of edgeR's calcNormFactors(). |
|
Calculate counts per million for each cell type. |
|
Calculate log counts per million for each cell type. |
|
Regress out covariates from obs. |
|
Perform differential expression (DE) on a Pseudobulk dataset with limma-voom. |
Utility#
Print a row of obs (the first row, by default) for a cell type (the first cell type, by default) with each column on its own line. |
|
Print a row of var (the first row, by default) for a cell type (the first cell type, by default) with each column on its own line. |
|
Apply a function to a Pseudobulk dataset. |
|
Apply a function to a Pseudobulk dataset's X. |
|
Apply a function to a Pseudobulk dataset's obs. |
|
Apply a function to a Pseudobulk dataset's var. |