join_obs#

Pseudobulk.join_obs(other, /, *, cell_types=None, excluded_cell_types=None, on=None, left_on=None, right_on=None, suffix='_right', validate='m:m', nulls_equal=False, coalesce=True)[source]#

Left-join each cell type’s obs with another DataFrame, using the same logic as df.join().

Parameters:
  • other: DataFrame

    a polars DataFrame to join each cell type’s obs with

  • on: str | Expr | Sequence[str | Expr] | None

    the name(s) of the join column(s) in both DataFrames

  • cell_types: str | Iterable[str] | None

    one or more cell types to operate on; if None, operate on all cell types. Mutually exclusive with excluded_cell_types.

  • excluded_cell_types: str | Iterable[str] | None

    one or more cell types to exclude from the operation; mutually exclusive with cell_types

  • left_on: str | Expr | Sequence[str | Expr] | None

    the name(s) of the join column(s) in obs

  • right_on: str | Expr | Sequence[str | Expr] | None

    the name(s) of the join column(s) in other

  • suffix: str

    a suffix to append to columns with a duplicate name

  • validate: Literal['m:m', 'm:1', '1:m', '1:1']

    checks whether the join is of the specified type. Can be:

    • ’m:m’ (many-to-many): the default, no checks performed.

    • ’1:1’ (one-to-one): check that none of the values in the join column(s) appear more than once in obs or more than once in other.

    • ’1:m’ (one-to-many): check that none of the values in the join column(s) appear more than once in obs.

    • ’m:1’ (many-to-one): check that none of the values in the join column(s) appear more than once in other.

  • nulls_equal: bool

    whether to include null as a valid value to join on. By default, null values will never produce matches.

  • coalesce: bool

    if True, coalesce each of the pairs of join columns (the columns in on or left_on/right_on) from obs and other into a single column, filling missing values from one with the corresponding values from the other. If False, include both as separate columns, adding suffix to the join columns from other.

Returns:

A new Pseudobulk dataset with the columns from other joined to each cell type’s obs.

Return type:

Pseudobulk

Note

If a column of on, left_on or right_on is Enum in obs and Categorical in other (or vice versa), or Enum in both but with different categories in each, that pair of columns will be automatically cast to a common Enum data type (with the union of the categories) before joining.