Pseudobulk#

class brisc.Pseudobulk(source=None, /, *, X=None, obs=None, var=None, num_threads=None)[source]#

A pseudobulked single-cell dataset resulting from calling pseudobulk() on a SingleCell dataset.

Has slots for:

X: a dict of NumPy arrays of counts per cell and gene for each cell type
obs: a dict of polars DataFrames of sample metadata for each cell type
var: a dict of polars DataFrames of gene metadata for each cell type
num_threads: the default number of threads to use for operations on the
dataset that support multithreading (which can be overridden by individual functions)

as well as obs_names and var_names, aliases for a dict of obs[:, 0] and var[:, 0] for each cell type.

In many ways, Pseudobulk objects behave like dictionaries:

pb1 | pb2 combines pseudobulks with non-overlapping cell types into one big pseudobulk
cell_type in pb tests whether cell_type is a cell type in the pseudobulk
for cell_type in pb: and for cell_type in pb.keys(): yield the cell type names
for X, obs, var in pb.values(): yields each cell type’s X, obs, and var
for cell_type, (X, obs, var) in pseudobulk.items(): yields both the name and the X, obs and var for each cell type

There are also custom iterators if you just want one field per cell type:

for X in pseudobulk.iter_X(): yields just the X for each cell type
for obs in pseudobulk.iter_obs(): yields just the obs
for var in pseudobulk.iter_var(): yields just the var

Parameters:

source : str | Path | None
X : dict[str, np.ndarray[np.dtype[np.integer | np.floating]]] | None
obs : dict[str, pl.DataFrame] | None
var : dict[str, pl.DataFrame] | None
num_threads : int | np.integer | None

I/O#

`Pseudobulk.__init__`	Load a saved Pseudobulk dataset, or create one from an in-memory count matrix + metadata for each cell type.
`Pseudobulk.save`	Saves a Pseudobulk dataset to directory (which must not exist unless overwrite=True, and will be created) with three files per cell type: the X at f'{cell_type}.X.npy', the obs at f'{cell_type}.obs.parquet', and the var at f'{cell_type}.var.parquet'.

Properties#

`Pseudobulk.X`	A dictionary of count matrices for each cell type, as NumPy arrays.
`Pseudobulk.obs`	A dictionary of Polars DataFrames of sample-level metadata for each cell type.
`Pseudobulk.var`	A dictionary of Polars DataFrames of gene-level metadata for each cell type.
`Pseudobulk.obs_names`	A shortcut to access the first column of obs for each cell type.
`Pseudobulk.var_names`	A shortcut to access the first column of var for each cell type.
`Pseudobulk.num_threads`	The default number of threads used for this Pseudobulk dataset's operations.
`Pseudobulk.shape`	a dictionary mapping each cell type to a length-2 tuple where the first element is the number of samples, and the second is the number of genes.

Data access#

`Pseudobulk.sample`	Get the row of X[cell_type] corresponding to a single sample, based on the sample's name in obs_names.
`Pseudobulk.gene`	Get the column of X[cell_type] corresponding to a single gene, based on the gene's name in var_names.

Dictionary interface#

`Pseudobulk.keys`	Get a KeysView (like you would get from dict.keys()) of this Pseudobulk dataset's cell types.
`Pseudobulk.values`	Get a ValuesView (like you would get from dict.values()) of (X, obs, var) tuples for each cell type in this Pseudobulk dataset.
`Pseudobulk.items`	Get an ItemsView (like you would get from dict.items()) of (cell_type, (X, obs, var)) tuples for each cell type in this Pseudobulk dataset.
`Pseudobulk.iter_X`	Iterate over each cell type's X.
`Pseudobulk.iter_obs`	Iterate over each cell type's obs.
`Pseudobulk.iter_var`	Iterate over each cell type's var.
`Pseudobulk.__contains__`	Check if this Pseudobulk dataset contains the specified cell type.
`Pseudobulk.__or__`	Combine the cell types of this Pseudobulk dataset with another.
`Pseudobulk.__eq__`	Test for equality with another Pseudobulk dataset.

Manipulation#

`Pseudobulk.set_obs_names`	Sets a column as the new first column of obs, i.e. the obs_names.
`Pseudobulk.set_var_names`	Sets a column as the new first column of var, i.e. the var_names.
`Pseudobulk.set_num_threads`	Return a new Pseudobulk dataset with a different default number of threads.
`Pseudobulk.filter_obs`	Equivalent to df.filter() from polars, but applied to both obs and X for each cell type.
`Pseudobulk.filter_var`	Equivalent to df.filter() from polars, but applied to both var and X for each cell type.
`Pseudobulk.select_obs`	Equivalent to df.select() from polars, but applied to each cell type's obs.
`Pseudobulk.select_var`	Equivalent to df.select() from polars, but applied to each cell type's var.
`Pseudobulk.select_cell_types`	Create a new Pseudobulk dataset subset to the cell type(s) in cell_types and more_cell_types.
`Pseudobulk.with_columns_obs`	Equivalent to df.with_columns() from polars, but applied to each cell type's obs.
`Pseudobulk.with_columns_var`	Equivalent to df.with_columns() from polars, but applied to each cell type's var.
`Pseudobulk.drop_obs`	Create a new Pseudobulk dataset with columns and more_columns removed from obs.
`Pseudobulk.drop_var`	Create a new Pseudobulk dataset with columns and more_columns removed from var.
`Pseudobulk.drop_cell_types`	Create a new Pseudobulk dataset with cell_types and more_cell_types removed.
`Pseudobulk.rename_obs`	Create a new Pseudobulk dataset with column(s) of obs renamed for each cell type.
`Pseudobulk.rename_var`	Create a new Pseudobulk dataset with column(s) of var renamed for each cell type.
`Pseudobulk.rename_cell_types`	Create a new Pseudobulk dataset with cell type(s) renamed.
`Pseudobulk.cast_X`	Cast each cell type's X to the specified data type.
`Pseudobulk.cast_obs`	Cast column(s) of each cell type's obs to the specified data type(s).
`Pseudobulk.cast_var`	Cast column(s) of each cell type's var to the specified data type(s).
`Pseudobulk.join_obs`	Left-join each cell type's obs with another DataFrame, using the same logic as polars.DataFrame.join().
`Pseudobulk.join_var`	Left-join each cell type's obs with another DataFrame, using the same logic as polars.DataFrame.join().
`Pseudobulk.subsample_obs`	Subsample a specific number or fraction of samples.
`Pseudobulk.subsample_var`	Subsample a specific number or fraction of genes.
`Pseudobulk.split_by_cell_type`	Split this Pseudobulk dataset into a tuple of Pseudobulk datasets with one cell type each.
`Pseudobulk.concat_obs`	Concatenate one or more other Pseudobulk datasets with this one, sample-wise.
`Pseudobulk.concat_var`	Concatenate one or more other Pseudobulk datasets with this one, gene-wise.

Transformation#

`Pseudobulk.copy`	Make a copy of this Pseudobulk dataset.
`Pseudobulk.to_df`	Convert this Pseudobulk object to a polars DataFrame, with one row per (sample, cell type) pair and one column per gene.
`Pseudobulk.map_X`	Apply a function to each cell type's X.
`Pseudobulk.map_obs`	Apply a function to each cell type's obs.
`Pseudobulk.map_var`	Apply a function to each cell type's var.

Analysis#

`Pseudobulk.qc`	Subsets each cell type to samples passing quality control (QC).
`Pseudobulk.library_size`	Calculate normalization factor-adjusted library sizes for each sample in each cell type, via the approach of edgeR's calcNormFactors().
`Pseudobulk.CPM`	Calculate counts per million for each cell type.
`Pseudobulk.log_CPM`	Calculate log counts per million for each cell type.
`Pseudobulk.regress_out`	Regress out covariates from obs.
`Pseudobulk.DE`	Perform differential expression (DE) on a Pseudobulk dataset with limma-voom.

Utility#

`Pseudobulk.peek_obs`	Print a row of obs (the first row, by default) for a cell type (the first cell type, by default) with each column on its own line.
`Pseudobulk.peek_var`	Print a row of var (the first row, by default) for a cell type (the first cell type, by default) with each column on its own line.
`Pseudobulk.pipe`	Apply a function to a Pseudobulk dataset.
`Pseudobulk.pipe_X`	Apply a function to a Pseudobulk dataset's X.
`Pseudobulk.pipe_obs`	Apply a function to a Pseudobulk dataset's obs.
`Pseudobulk.pipe_var`	Apply a function to a Pseudobulk dataset's var.