Pseudobulk#

class brisc.Pseudobulk(source=None, /, *, X=None, obs=None, var=None, num_threads=None)[source]#

A pseudobulked single-cell dataset resulting from calling pseudobulk() on a SingleCell dataset.

Has slots for:

  • X: a dict of NumPy arrays of counts per cell and gene for each cell type

  • obs: a dict of polars DataFrames of sample metadata for each cell type

  • var: a dict of polars DataFrames of gene metadata for each cell type

  • num_threads: the default number of threads to use for operations on the

    dataset that support multithreading (which can be overridden by individual functions)

as well as obs_names and var_names, aliases for a dict of obs[:, 0] and var[:, 0] for each cell type.

In many ways, Pseudobulk objects behave like dictionaries:

  • pb1 | pb2 combines pseudobulks with non-overlapping cell types into one big pseudobulk

  • cell_type in pb tests whether cell_type is a cell type in the pseudobulk

  • for cell_type in pb: and for cell_type in pb.keys(): yield the cell type names

  • for X, obs, var in pb.values(): yields each cell type’s X, obs, and var

  • for cell_type, (X, obs, var) in pseudobulk.items(): yields both the name and the X, obs and var for each cell type

There are also custom iterators if you just want one field per cell type:

  • for X in pseudobulk.iter_X(): yields just the X for each cell type

  • for obs in pseudobulk.iter_obs(): yields just the obs

  • for var in pseudobulk.iter_var(): yields just the var

Parameters:
  • source : str | Path | None

  • X : dict[str, np.ndarray[np.dtype[np.integer | np.floating]]] | None

  • obs : dict[str, pl.DataFrame] | None

  • var : dict[str, pl.DataFrame] | None

  • num_threads : int | np.integer | None

I/O#

Pseudobulk.__init__

Load a saved Pseudobulk dataset, or create one from an in-memory count matrix + metadata for each cell type.

Pseudobulk.save

Saves a Pseudobulk dataset to directory (which must not exist unless overwrite=True, and will be created) with three files per cell type: the X at f'{cell_type}.X.npy', the obs at f'{cell_type}.obs.parquet', and the var at f'{cell_type}.var.parquet'.

Properties#

Pseudobulk.X

A dictionary of count matrices for each cell type, as NumPy arrays.

Pseudobulk.obs

A dictionary of Polars DataFrames of sample-level metadata for each cell type.

Pseudobulk.var

A dictionary of Polars DataFrames of gene-level metadata for each cell type.

Pseudobulk.obs_names

A shortcut to access the first column of obs for each cell type.

Pseudobulk.var_names

A shortcut to access the first column of var for each cell type.

Pseudobulk.num_threads

The default number of threads used for this Pseudobulk dataset's operations.

Pseudobulk.shape

a dictionary mapping each cell type to a length-2 tuple where the first element is the number of samples, and the second is the number of genes.

Data access#

Pseudobulk.sample

Get the row of X[cell_type] corresponding to a single sample, based on the sample's name in obs_names.

Pseudobulk.gene

Get the column of X[cell_type] corresponding to a single gene, based on the gene's name in var_names.

Dictionary interface#

Pseudobulk.keys

Get a KeysView (like you would get from dict.keys()) of this Pseudobulk dataset's cell types.

Pseudobulk.values

Get a ValuesView (like you would get from dict.values()) of (X, obs, var) tuples for each cell type in this Pseudobulk dataset.

Pseudobulk.items

Get an ItemsView (like you would get from dict.items()) of (cell_type, (X, obs, var)) tuples for each cell type in this Pseudobulk dataset.

Pseudobulk.iter_X

Iterate over each cell type's X.

Pseudobulk.iter_obs

Iterate over each cell type's obs.

Pseudobulk.iter_var

Iterate over each cell type's var.

Pseudobulk.__contains__

Check if this Pseudobulk dataset contains the specified cell type.

Pseudobulk.__or__

Combine the cell types of this Pseudobulk dataset with another.

Pseudobulk.__eq__

Test for equality with another Pseudobulk dataset.

Manipulation#

Pseudobulk.set_obs_names

Sets a column as the new first column of obs, i.e. the obs_names.

Pseudobulk.set_var_names

Sets a column as the new first column of var, i.e. the var_names.

Pseudobulk.set_num_threads

Return a new Pseudobulk dataset with a different default number of threads.

Pseudobulk.filter_obs

Equivalent to df.filter() from polars, but applied to both obs and X for each cell type.

Pseudobulk.filter_var

Equivalent to df.filter() from polars, but applied to both var and X for each cell type.

Pseudobulk.select_obs

Equivalent to df.select() from polars, but applied to each cell type's obs.

Pseudobulk.select_var

Equivalent to df.select() from polars, but applied to each cell type's var.

Pseudobulk.select_cell_types

Create a new Pseudobulk dataset subset to the cell type(s) in cell_types and more_cell_types.

Pseudobulk.with_columns_obs

Equivalent to df.with_columns() from polars, but applied to each cell type's obs.

Pseudobulk.with_columns_var

Equivalent to df.with_columns() from polars, but applied to each cell type's var.

Pseudobulk.drop_obs

Create a new Pseudobulk dataset with columns and more_columns removed from obs.

Pseudobulk.drop_var

Create a new Pseudobulk dataset with columns and more_columns removed from var.

Pseudobulk.drop_cell_types

Create a new Pseudobulk dataset with cell_types and more_cell_types removed.

Pseudobulk.rename_obs

Create a new Pseudobulk dataset with column(s) of obs renamed for each cell type.

Pseudobulk.rename_var

Create a new Pseudobulk dataset with column(s) of var renamed for each cell type.

Pseudobulk.rename_cell_types

Create a new Pseudobulk dataset with cell type(s) renamed.

Pseudobulk.cast_X

Cast each cell type's X to the specified data type.

Pseudobulk.cast_obs

Cast column(s) of each cell type's obs to the specified data type(s).

Pseudobulk.cast_var

Cast column(s) of each cell type's var to the specified data type(s).

Pseudobulk.join_obs

Left-join each cell type's obs with another DataFrame, using the same logic as polars.DataFrame.join().

Pseudobulk.join_var

Left-join each cell type's obs with another DataFrame, using the same logic as polars.DataFrame.join().

Pseudobulk.subsample_obs

Subsample a specific number or fraction of samples.

Pseudobulk.subsample_var

Subsample a specific number or fraction of genes.

Pseudobulk.split_by_cell_type

Split this Pseudobulk dataset into a tuple of Pseudobulk datasets with one cell type each.

Pseudobulk.concat_obs

Concatenate one or more other Pseudobulk datasets with this one, sample-wise.

Pseudobulk.concat_var

Concatenate one or more other Pseudobulk datasets with this one, gene-wise.

Transformation#

Pseudobulk.copy

Make a copy of this Pseudobulk dataset.

Pseudobulk.to_df

Convert this Pseudobulk object to a polars DataFrame, with one row per (sample, cell type) pair and one column per gene.

Pseudobulk.map_X

Apply a function to each cell type's X.

Pseudobulk.map_obs

Apply a function to each cell type's obs.

Pseudobulk.map_var

Apply a function to each cell type's var.

Analysis#

Pseudobulk.qc

Subsets each cell type to samples passing quality control (QC).

Pseudobulk.library_size

Calculate normalization factor-adjusted library sizes for each sample in each cell type, via the approach of edgeR's calcNormFactors().

Pseudobulk.CPM

Calculate counts per million for each cell type.

Pseudobulk.log_CPM

Calculate log counts per million for each cell type.

Pseudobulk.regress_out

Regress out covariates from obs.

Pseudobulk.DE

Perform differential expression (DE) on a Pseudobulk dataset with limma-voom.

Utility#

Pseudobulk.peek_obs

Print a row of obs (the first row, by default) for a cell type (the first cell type, by default) with each column on its own line.

Pseudobulk.peek_var

Print a row of var (the first row, by default) for a cell type (the first cell type, by default) with each column on its own line.

Pseudobulk.pipe

Apply a function to a Pseudobulk dataset.

Pseudobulk.pipe_X

Apply a function to a Pseudobulk dataset's X.

Pseudobulk.pipe_obs

Apply a function to a Pseudobulk dataset's obs.

Pseudobulk.pipe_var

Apply a function to a Pseudobulk dataset's var.