Pseudobulk#
A pseudobulked single-cell dataset resulting from calling pseudobulk() on a SingleCell dataset. Has slots for:
X: a dict of NumPy arrays of counts per cell and gene for each cell type
obs: a dict of polars DataFrames of sample metadata for each cell type
var: a dict of polars DataFrames of gene metadata for each cell type
num_threads: the default number of threads to use for operations on the dataset that support multithreading (which can be overridden by individual functions)
as well as obs_names and var_names, aliases for a dict of obs[:, 0] and var[:, 0] for each cell type.
In many ways, Pseudobulk datasets behave like dictionaries:
pb1 | pb2 combines pseudobulks with non-overlapping cell types into one big pseudobulk
cell_type in pb tests whether cell_type is a cell type in the pseudobulk
for cell_type in pb: and for cell_type in pb.keys(): yield the cell type names
for X, obs, var in pb.values(): yields each cell type’s X, obs, and var
for cell_type, (X, obs, var) in pseudobulk.items(): yields both the name and the X, obs and var for each cell type
There are also custom iterators if you just want one field per cell type:
- class brisc.Pseudobulk(source=None, /, *, X=None, obs=None, var=None, num_threads=None)[source]#
- Parameters:
source: str | Path | None
a directory to load a saved Pseudobulk dataset from (see save()). Mutually exclusive with X, obs, and var.
X: dict[str, ndarray[dtype[floating]]] | None
a {cell type: NumPy array} dictionary of counts or log CPMs. Mutually exclusive with source.
obs: dict[str, DataFrame] | None
a {cell type: polars DataFrame} dict of metadata per sample, when X is a dictionary. The first column must be String, Enum, Categorical, or integer. Mutually exclusive with source.
var: dict[str, DataFrame] | None
a {cell type: polars DataFrame} dict of metadata per gene, when X is a dictionary. The first column must be String, Enum, Categorical, or integer. Mutually exclusive with source.
num_threads: int | None
the default number of threads to use for all subsequent operations on this Pseudobulk dataset. By default (num_threads=None), use all available cores, as determined by
os.cpu_count().
Analysis#
Subsets each cell type to samples passing quality control (QC). |
|
Calculate normalization factor-adjusted library sizes for each sample in each cell type, via the approach of edgeR's |
|
Calculate counts per million for each cell type. |
|
Calculate log counts per million for each cell type. |
|
Regress out covariates from obs. |
|
Perform differential expression (DE) on a Pseudobulk dataset with limma-voom. |
I/O#
Properties#
A dictionary of count matrices for each cell type, as NumPy arrays. |
|
A dictionary of Polars DataFrames of sample-level metadata for each cell type. |
|
A dictionary of Polars DataFrames of gene-level metadata for each cell type. |
|
A shortcut to access the first column of obs for each cell type. |
|
A shortcut to access the first column of var for each cell type. |
|
The default number of threads used for this Pseudobulk dataset's operations. |
|
a dictionary mapping each cell type to a length-2 tuple where the first element is the number of samples, and the second is the number of genes. |
Data access#
Get the row of X[cell_type] corresponding to a single sample, based on the sample's name in obs_names. |
|
Get the column of X[cell_type] corresponding to a single gene, based on the gene's name in var_names. |
|
Print a row of obs (the first row, by default) for a cell type (the first cell type, by default) with each column on its own line. |
|
Print a row of var (the first row, by default) for a cell type (the first cell type, by default) with each column on its own line. |
Dictionary interface#
Get a KeysView (like you would get from dict.keys()) of this Pseudobulk dataset's cell types. |
|
Get a ValuesView (like you would get from dict.values()) of (X, obs, var) tuples for each cell type in this Pseudobulk dataset. |
|
Get an ItemsView (like you would get from dict.items()) of (cell_type, (X, obs, var)) tuples for each cell type in this Pseudobulk dataset. |
|
Iterate over each cell type's X. |
|
Iterate over each cell type's obs. |
|
Iterate over each cell type's var. |
|
Check if this Pseudobulk dataset contains the specified cell type. |
|
Combine the cell types of this Pseudobulk dataset with another. |
|
Test for equality with another Pseudobulk dataset. |
Manipulation#
Sets a column as the new first column of obs, i.e. the obs_names. |
|
Sets a column as the new first column of var, i.e. the var_names. |
|
Return a new Pseudobulk dataset with a different default number of threads. |
|
Equivalent to |
|
Equivalent to |
|
Equivalent to |
|
Equivalent to |
|
Create a new Pseudobulk dataset subset to the cell type(s) in cell_types and more_cell_types. |
|
Equivalent to |
|
Equivalent to |
|
Create a new Pseudobulk dataset with columns and more_columns removed from obs. |
|
Create a new Pseudobulk dataset with columns and more_columns removed from var. |
|
Create a new Pseudobulk dataset with cell_types and more_cell_types removed. |
|
Create a new Pseudobulk dataset with column(s) of obs renamed for each cell type. |
|
Create a new Pseudobulk dataset with column(s) of var renamed for each cell type. |
|
Create a new Pseudobulk dataset with cell type(s) renamed. |
|
Cast each cell type's X to the specified data type. |
|
Cast column(s) of each cell type's obs to the specified data type(s). |
|
Cast column(s) of each cell type's var to the specified data type(s). |
|
Left-join each cell type's obs with another DataFrame, using the same logic as |
|
Left-join each cell type's var with another DataFrame, using the same logic as |
|
Subsample a specific number or fraction of samples. |
|
Subsample a specific number or fraction of genes. |
|
Split this Pseudobulk dataset into a tuple of Pseudobulk datasets with one cell type each. |
|
Concatenate one or more other Pseudobulk datasets with this one, sample-wise. |
|
Concatenate one or more other Pseudobulk datasets with this one, gene-wise. |
|
Make a copy of this Pseudobulk dataset. |
|
Convert this Pseudobulk object to a polars DataFrame, with one row per (sample, cell type) pair and one column per gene. |
|
Apply a function to each cell type's X. |
|
Apply a function to each cell type's obs. |
|
Apply a function to each cell type's var. |
|
Apply a function to a Pseudobulk dataset. |
|
Apply a function to a Pseudobulk dataset's X. |
|
Apply a function to a Pseudobulk dataset's obs. |
|
Apply a function to a Pseudobulk dataset's var. |