swnn.utils.process module¶

@CreateDate: 2020/07/18 @Author: Xingyan Liu @File: process.py @Project: stagewiseNN

swnn.utils.process.reverse_dict(d)¶: the values of the dict must be list-like type

swnn.utils.process.describe_series(srs, max_cats=100, asstr=False)¶: inspect data-structure

swnn.utils.process.set_adata_hvgs(adata, gene_list=None, indicator=None, slim=True, copy=False)¶: Setting the given (may be pre-computed) set of genes as highly variable, if copy is False, changes will be made to the input adata. if slim is True and adata.raw is None, raw data will be backup.

swnn.utils.process.change_names(seq, mapping=None, **kwmaps)¶

swnn.utils.process.normalize_default(adata, target_sum=None, copy=False, log_only=False)¶

Normalizing datasets with default settings (total-counts normalization followed by log(x+1) transform).

Parameters

adata – AnnData object
target_sum – scale factor of total-count normalization
copy – whether to copy the dataset
log_only – whether to skip the “total-counts normalization” and only perform log(x+1) transform

Returns

Return type

AnnData or None

swnn.utils.process.normalize_log_then_total(adata, target_sum=None, copy=False)¶: For SplitSeq data, performing log(x+1) BEFORE total-sum normalization will results a better UMAP visualization (e.g. clusters would be less confounded by different total-counts ).

swnn.utils.process.groupwise_hvgs_freq(adata, groupby='batch', return_hvgs=True, **hvg_kwds)¶

Separately compute highly variable genes (HVGs) for each group, and count the frequencies of genes being selected as HVGs among those groups.

Parameters

adata – the AnnData object
groupby – a column name in adata.obs specifying batches or groups that you would like to independently compute HVGs.
return_hvgs (bool) – whether to return the computed dict of HVG-lists for each group
hvg_kwds – Other Parameters for sc.pp.highly_variable_genes

Returns

swnn.utils.process.set_precomputed_neighbors(adata, distances, connectivities, n_neighbors=15, metric='cosine', method='umap', metric_kwds=None, use_rep=None, n_pcs=None, key_added=None)¶

swnn.utils.process.quick_preprocess_raw(adata, target_sum=None, hvgs=None, batch_key=None, copy=True, log_first=False, **hvg_kwds)¶

Go through the data-analysis pipeline, including normalization, HVG selection, and z-scoring (centering and scaling)

Parameters

adata (AnnData) – the Anndata object
target_sum (Optional[int]) – the target total counts after normalization. If None, after normalization, each observation (cell) has a total count equal to the median of total counts for observations (cells) before normalization.
hvgs (Optional[Sequence]) – highly variable genes to be used for dimensionality reduction (centering and PCA)
batch_key – a column name in adata.obs specifying the batch labels
copy – whether to make a co[y of the input data. if False, the data object will be change inplace.
log_first (bool) – for some data distributions, perform log(x+1) before total-count normalization might give a better result (e.g. clustering results may be less affected by the sequencing depths)
hvg_kwds – other key-word parameters for sc.pp.highly_variable_genes

Return type

AnnData

swnn.utils.process.label_binarize_each(labels, classes, sparse_out=True)¶

swnn.utils.process.group_mean(X, labels, binary=False, classes=None, features=None, print_groups=True)¶

This function may work with more efficiency than df.groupby().mean() when handling sparse matrix.

Parameters

swnn.utils.process.group_mean_dense(X, labels, binary=False, index_name='group', classes=None)¶

swnn.utils.process.group_median_dense(X, labels, binary=False, index_name='group', classes=None)¶

swnn.utils.process.group_mean_adata(adata, groupby, features=None, binary=False, use_raw=False)¶

Compute averaged feature-values for each group

Parameters

adata (AnnData) –
groupby (str) – a column name in adata.obs
features – a subset of names in adata.var_names (or adata.raw.var_names)
binary (bool) – if True, the results will turn to be the non-zeor proportions for all (or the given) features
use_raw (bool) – whether to access adata.raw to compute the averages.

Returns

Return type

a pd.DataFrame with features as index and groups as columns