concat_var#

Pseudobulk.concat_var(datasets, /, *more_datasets, dataset_column=None, dataset_labels=None, flexible=False)[source]#

Concatenate one or more other Pseudobulk datasets with this one, gene-wise. This is much less common than the sample-wise concatenation provided by concat_obs(). All datasets must have the same cell types, and all datasets must have distinct var_names.

By default, all datasets must have the same obs. They must also have the same columns in var, with the same data types.

Conversely, if flexible=True, subset to samples present in all datasets (according to the first column of obs, i.e. the obs_names) before concatenating. Subset to columns of obs that are identical in all datasets after this subsetting. Also, subset to columns of var that are present in all datasets, and have the same data types. All datasets’ obs_names must have the same name and dtype, and similarly for their var_names.

The one exception to the var “same data type” rule: if a column is Enum in some datasets and Categorical in others, or Enum in all datasets but with different categories in each dataset, that column will be retained as an Enum column (with the union of the categories) in the concatenated var.

Parameters:
  • datasets : Pseudobulk | Iterable[Pseudobulk]

    one or more Pseudobulk datasets to concatenate with this one

  • *more_datasets : Pseudobulk

    additional Pseudobulk datasets to concatenate with this one, specified as positional arguments

  • dataset_column : str | None

    the name of an Enum column to be added to the concatenated dataset’s var labeling which dataset each cell came from. The labels themselves are determined by the dataset_labels argument.

  • dataset_labels : Iterable[str] | None

    a sequence of labels for each dataset, used to populate dataset_column. There must be one label per dataset being concatenated. If dataset_labels is not specified, the labels default to {dataset_column}_0, {dataset_column}_1, …, {dataset_column}_{N - 1}. Can only be specified when dataset_column is not None.

  • flexible : bool

    whether to subset to samples and columns of obs and var common to all datasets before concatenating, rather than raising an error on any mismatches

Returns:

The concatenated Pseudobulk dataset.

Return type:

Pseudobulk