qc_metrics#

SingleCell.qc_metrics(*, num_counts_column='num_counts', num_genes_column='num_genes', mito_fraction_column='mito_fraction', allow_float=False, overwrite=False, num_threads=None)[source]#

Adds quality-control metrics to obs for each cell: the sum of counts across all genes (num_counts), the number of genes with non-zero expression (num_genes), and the fraction of counts that are mitochondrial (mito_fraction).

This function is intended to be run before qc() for users interested in better understanding the quality of their dataset. It is not a required step, since qc() calculates its own filters internally.

Parameters:
  • num_counts_column: str

    the name of an integer column to be added to obs containing each cell’s sum of counts across all genes

  • num_genes_column: str

    the name of an integer column to be added to obs containing each cell’s number of genes with non-zero expression

  • mito_fraction_column: str

    the name of an integer column to be added to obs containing each cell’s fraction of counts that are mitochondrial (i.e. from genes starting with ‘MT’)

  • allow_float: bool

    if False, raise an error if self.X.dtype is floating-point (suggesting the user may not be using the raw counts); if True, disable this sanity check. Note that all steps except mitochondrial percent filtering give the same result on normalized counts, so as long as max_mito_fraction=None is specified (not typically recommended), this function will give the same result on raw and normalized counts.

  • overwrite: bool

    if False, raise an error if any of the new columns already exist in obs; if True, overwrite them.

  • num_threads: int | None

    the number of threads to use when calculating the quality-control metrics. Set num_threads=-1 to use all available cores, as determined by os.cpu_count(), or leave unset to use self.num_threads cores. Does not affect the resulting SingleCell dataset’s num_threads; this will always be the same as the original dataset’s num_threads.

Returns:

A new SingleCell dataset with the three metrics added as columns of obs.

Return type:

SingleCell

Note

This function will give an incorrect output when run on normalized data, since floating-point counts will be truncated to integers.

Note

This function may give an incorrect output if the count matrix contains explicit zeros (i.e. if (sc.X.data == 0).any()): this is not checked for, due to speed considerations. In the unlikely event that your dataset contains explicit zeros, remove them by running sc.X.eliminate_zeros() (an in-place operation) first.