split_by_var#

SingleCell.split_by_var(by_column, /, *, sort_by_size=False, num_threads=None)[source]#

The opposite of concat_var(): splits a SingleCell dataset into a dictionary of SingleCell datasets, one per unique value of a column of var.

Parameters:
  • by_column: SingleCellColumn

    a String, Enum, Categorical, or integer column of var to split by. Can be a column name, a polars expression, a polars Series, a 1D NumPy array, or a function that takes in this SingleCell dataset and returns a polars Series or 1D NumPy array. Can contain null entries: the corresponding genes will not be included in the result.

  • sort_by_size: bool

    if True, datasets in the returned dictionary will be sorted in decreasing order of size. If False, they will be sorted in ascending order, according to the sort order of by_column’s data type.

  • num_threads: int | None

    the number of threads to use when splitting X. Set num_threads=-1 to use all available cores, as determined by os.cpu_count(). By default (num_threads=None), use self.num_threads cores. Can only be specified when X is not None.

Returns:

A dictionary mapping each unique value of by_column to a SingleCell dataset subset to genes where column has that value.

Return type:

dict[str, SingleCell]