concat_obs#

SingleCell.concat_obs(datasets, /, *more_datasets, dataset_column=None, dataset_labels=None, flexible=False, num_threads=None)[source]#

Concatenate one or more other SingleCell datasets with this one, cell-wise. All datasets must have distinct obs_names.

By default, all datasets must have the same var, varm, varp, and uns. They must also have the same columns in obs and the same keys in obsm, with the same data types. obsp will be discarded during the concatenation.

Conversely, if flexible=True, subset to genes present in all datasets (according to the first column of var, i.e. the var_names) before concatenating. Subset to columns of var and keys of varm, varp, and uns that are identical in all datasets after this subsetting. Also, subset to columns of obs and keys of obsm that are present in all datasets, and have the same data types. All datasets’ obs_names must have the same name and data type, and similarly for their var_names.

The one exception to the obs “same data type” rule: if a column is Enum in some datasets and Categorical in others, or Enum in all datasets but with different categories in each dataset, that column will be retained as an Enum column (with the union of the categories) in the concatenated obs.

If the datasets’ X are a mix of CSR and CSC sparse arrays, they will all be coerced to CSR.

Parameters:
  • datasets : SingleCell | Iterable[SingleCell]

    one or more SingleCell datasets to concatenate with this one

  • *more_datasets : SingleCell

    additional SingleCell datasets to concatenate with this one, specified as positional arguments

  • dataset_column : str | None

    the name of an Enum column to be added to the concatenated dataset’s obs labeling which dataset each cell came from. The labels themselves are determined by the dataset_labels argument.

  • dataset_labels : Iterable[str] | None

    a sequence of labels for each dataset, used to populate dataset_column. There must be one label per dataset being concatenated. If dataset_labels is not specified, the labels default to {dataset_column}_0, {dataset_column}_1, …, {dataset_column}_{N - 1}. Can only be specified when dataset_column is not None.

  • flexible : bool

    whether to subset to genes, columns of obs and var, and keys of obsm, varm and uns common to all datasets before concatenating, rather than raising an error on any mismatches

  • num_threads : int | None

    the number of threads to use when concatenating. Does not affect the concatenated SingleCell dataset’s num_threads; this will always be the same as the first dataset’s num_threads.

Returns:

The concatenated SingleCell dataset.

Return type:

SingleCell