umap#
- SingleCell.umap(*, QC_column='passed_QC', PC_key='pca', neighbors_key='neighbors', distances_key='distances', embedding_key='umap', num_iterations=200, alpha=1, gamma=1, negative_sample_rate=5, a=None, b=None, spread=1, min_dist=0.5, seed=0, overwrite=False, hogwild=False, num_threads=None)[source]#
Calculate a two-dimensional embedding of this SingleCell dataset with UMAP (Uniform Manifold Approximation and Projection), suitable for plotting with plot_embedding().
Use hogwild=True to run in parallel. Results will not be reproducible!
This function is intended to be run after pca() and neighbors(). By default, it uses obsm[‘pca’], obsm[‘neighbors’], and obsm[‘distances’] as the inputs to UMAP, and stores the output in obsm[‘umap’] as a len(obs) × 2 NumPy array. It can also be run on Harmony embeddings by running harmonize() and then specifying PC_key=’harmony’.
- Parameters:
QC_column: SingleCellColumn | None
an optional Boolean column of obs indicating which cells passed QC. Can be a column name, a polars expression, a polars Series, a 1D NumPy array, or a function that takes in this SingleCell dataset and returns a polars Series or 1D NumPy array. Set to None to include all cells. Cells failing QC will be ignored and have their embeddings set to NaN.
PC_key: str
the key of obsm containing the principal components calculated with pca(), to use as an input for the embedding calculation
neighbors_key: str
the key of obsm containing the nearest-neighbor indices for each cell, to use as an input for the embedding calculation
distances_key: str
the key of obsm containing the squared Euclidean distance to each nearest neighbor in neighbors_key, to use as an input for the embedding calculation
embedding_key: str
the key of obsm where the embeddings will be stored
num_iterations: int
the number of optimization iterations. In umap-learn, this defaulted to 500 for datasets of 10,000 elements or less, and 200 for datasets larger than 10,000 elements.
alpha: int | float
the initial learning rate for optimization
gamma: int | float
the weight applied to negative samples during optimization
negative_sample_rate: int
the number of negative samples per positive sample
a: int | float | None
UMAP curve parameter; if None, will be fit based on the values of spread and min_dist. Either both or neither of a and b must be None.
b: int | float | None
UMAP curve parameter; if None, will be fit based on the values of spread and min_dist. Either both or neither of a and b must be None.
spread: float
the effective scale of embedded points. Only used when a and b are None.
min_dist: float
the minimum distance between points in the embedding. Only used when a and b are None.
seed: int
the random seed to use for UMAP
overwrite: bool
if True, overwrite embedding_key if already present in obsm, instead of raising an error
hogwild: bool
if True, go Hogwild! and optimize the embedding in parallel. Results will not be reproducible!
num_threads: int | None
the number of threads to use when running umap(). Cannot be specified when hogwild=False. When hogwild=True, must be explicitly specified and greater than 1 unless self.num_threads is greater than 1, in which case it can be left unset. Set num_threads=-1 to use all available cores, as determined by
os.cpu_count(), or leave unset to use self.num_threads cores when hogwild=True and one core when hogwild=False. Does not affect the returned SingleCell dataset’s num_threads; this will always be the same as the original dataset’s num_threads.
- Returns:
A new SingleCell dataset with the UMAP embedding stored in obsm[embedding_key].
- Return type: