log_cpm#

Pseudobulk.log_cpm(*, library_size_column='library_size', prior_count=2, allow_float=False)[source]#

Calculate log counts per million for each cell type.

Must be run after library_size(). Must not be run before de(), since de() already normalizes the data internally.

Results were verified to match edgeR to within floating-point error.

Parameters:

library_size_column: PseudobulkColumn
a floating-point column of obs containing each sample’s library size. Can be a column name, a polars expression, a polars Series, a 1D NumPy array, or a function that takes in this Pseudobulk dataset and a cell type and returns a polars Series or 1D NumPy array. Or, a dictionary mapping cell-type names to any of the above; each cell type in this Pseudobulk dataset must be present.
allow_float: bool
if False, raise an error if self.X.dtype is floating-point (suggesting the user may not be using the raw counts); if True, disable this sanity check
prior_count: int | float
the pseudocount to add before log-transforming. The corresponding argument in edgeR, prior.count, now defaults to 2 instead of the old default of 0.5.

Returns:

A new Pseudobulk dataset containing the log(CPMs).

Return type:

Pseudobulk