split_by_obs#

SingleCell.split_by_obs(by_column, /, *, QC_column='passed_QC', sort_by_size=False, num_threads=None)[source]#

The opposite of concat_obs(): splits a SingleCell dataset into a dictionary of SingleCell datasets, one per unique value of a column of obs.

Parameters:

by_column: SingleCellColumn
a String, Enum, Categorical, or integer column of obs to split by. Can be a column name, a polars expression, a polars Series, a 1D NumPy array, or a function that takes in this SingleCell dataset and returns a polars Series or 1D NumPy array. Can contain null entries: the corresponding cells will not be included in the result.
QC_column: SingleCellColumn | None
an optional Boolean column of obs indicating which cells passed QC. Can be a column name, a polars expression, a polars Series, a 1D NumPy array, or a function that takes in this SingleCell dataset and returns a polars Series or 1D NumPy array. Set to None to include all cells. Cells failing QC will not be selected when splitting.
sort_by_size: bool
if True, datasets in the returned dictionary will be sorted in decreasing order of size. If False, they will be sorted in ascending order, according to the sort order of by_column’s data type.
num_threads: int | None
the number of threads to use when splitting X. Set num_threads=-1 to use all available cores, as determined by os.cpu_count(). By default (num_threads=None), use self.num_threads cores. Can only be specified when X is not None.

Returns:

A dictionary mapping each unique value of by_column to a SingleCell dataset subset to cells where column has that value.

Return type:

dict[str, SingleCell]