scvelo.pp.filter_and_normalize

scvelo.pp.filter_and_normalize(data, min_counts=None, min_counts_u=None, min_cells=None, min_cells_u=None, min_shared_counts=None, min_shared_cells=None, n_top_genes=None, retain_genes=None, subset_highly_variable=True, flavor='seurat', log=True, layers_normalize=None, copy=False, **kwargs)

Filtering, normalization and log transform.

Expects non-logarithmized data. If using logarithmized data, pass log=False.

Runs the following steps

scv.pp.filter_genes(adata)
scv.pp.normalize_per_cell(adata)
if n_top_genes is not None:
    scv.pp.filter_genes_dispersion(adata)
if log:
    scv.pp.log1p(adata)
data: AnnData

Annotated data matrix.

min_counts: int (default: None)

Minimum number of counts required for a gene to pass filtering (spliced).

min_counts_u: int (default: None)

Minimum number of counts required for a gene to pass filtering (unspliced).

min_cells: int (default: None)

Minimum number of cells expressed required to pass filtering (spliced).

min_cells_u: int (default: None)

Minimum number of cells expressed required to pass filtering (unspliced).

min_shared_counts: int, optional (default: None)

Minimum number of counts (both unspliced and spliced) required for a gene.

min_shared_cells: int, optional (default: None)

Minimum number of cells required to be expressed (both unspliced and spliced).

n_top_genes: int (default: None)

Number of genes to keep.

retain_genes: list, optional (default: None)

List of gene names to be retained independent of thresholds.

subset_highly_variable: bool (default: True)

Whether to subset highly variable genes or to store in .var[‘highly_variable’].

flavor: {‘seurat’, ‘cell_ranger’, ‘svr’}, optional (default: ‘seurat’)

Choose the flavor for computing normalized dispersion. If choosing ‘seurat’, this expects non-logarithmized data.

log: bool (default: True)

Take logarithm.

layers_normalize: list of str (default: None)

List of layers to be normalized. If set to None, the layers {‘X’, ‘spliced’, ‘unspliced’} are considered for normalization upon testing whether they have already been normalized (by checking type of entries: int -> unprocessed, float -> processed).

copy: bool (default: False)

Return a copy of adata instead of updating it.

**kwargs:

Keyword arguments passed to pp.normalize_per_cell (e.g. counts_per_cell).

Returns

Returns or updates adata depending on copy.