swnn.utils.process module¶
@CreateDate: 2020/07/18 @Author: Xingyan Liu @File: process.py @Project: stagewiseNN
- swnn.utils.process.check_dirs(path)¶
- swnn.utils.process.reverse_dict(d)¶
the values of the dict must be list-like type
- swnn.utils.process.describe_dataframe(df, **kwargs)¶
- swnn.utils.process.describe_series(srs, max_cats=100, asstr=False)¶
inspect data-structure
- swnn.utils.process.make_binary(mat)¶
- swnn.utils.process.set_adata_hvgs(adata, gene_list=None, indicator=None, slim=True, copy=False)¶
Setting the given (may be pre-computed) set of genes as highly variable, if copy is False, changes will be made to the input adata. if slim is True and adata.raw is None, raw data will be backup.
- swnn.utils.process.change_names(seq, mapping=None, **kwmaps)¶
- Return type
list
- swnn.utils.process.normalize_default(adata, target_sum=None, copy=False, log_only=False)¶
Normalizing datasets with default settings (total-counts normalization followed by log(x+1) transform).
- Parameters
adata –
AnnData
objecttarget_sum – scale factor of total-count normalization
copy – whether to copy the dataset
log_only – whether to skip the “total-counts normalization” and only perform log(x+1) transform
- Returns
- Return type
AnnData
or None
- swnn.utils.process.normalize_log_then_total(adata, target_sum=None, copy=False)¶
For SplitSeq data, performing log(x+1) BEFORE total-sum normalization will results a better UMAP visualization (e.g. clusters would be less confounded by different total-counts ).
- swnn.utils.process.groupwise_hvgs_freq(adata, groupby='batch', return_hvgs=True, **hvg_kwds)¶
Separately compute highly variable genes (HVGs) for each group, and count the frequencies of genes being selected as HVGs among those groups.
- Parameters
adata – the
AnnData
objectgroupby – a column name in
adata.obs
specifying batches or groups that you would like to independently compute HVGs.return_hvgs (
bool
) – whether to return the computed dict of HVG-lists for each grouphvg_kwds – Other Parameters for
sc.pp.highly_variable_genes
- Returns
hvg_freq (dict) – the HVG frequencies
hvg_dict (dict) – returned only if
return_hvgs
is True
- swnn.utils.process.take_high_freq_elements(freq, min_freq=3)¶
- swnn.utils.process.set_precomputed_neighbors(adata, distances, connectivities, n_neighbors=15, metric='cosine', method='umap', metric_kwds=None, use_rep=None, n_pcs=None, key_added=None)¶
- swnn.utils.process.quick_preprocess_raw(adata, target_sum=None, hvgs=None, batch_key=None, copy=True, log_first=False, **hvg_kwds)¶
Go through the data-analysis pipeline, including normalization, HVG selection, and z-scoring (centering and scaling)
- Parameters
adata (
AnnData
) – theAnndata
objecttarget_sum (
Optional
[int
]) – the target total counts after normalization. If None, after normalization, each observation (cell) has a total count equal to the median of total counts for observations (cells) before normalization.hvgs (
Optional
[Sequence
]) – highly variable genes to be used for dimensionality reduction (centering and PCA)batch_key – a column name in
adata.obs
specifying the batch labelscopy – whether to make a co[y of the input data. if False, the data object will be change inplace.
log_first (
bool
) – for some data distributions, perform log(x+1) before total-count normalization might give a better result (e.g. clustering results may be less affected by the sequencing depths)hvg_kwds – other key-word parameters for
sc.pp.highly_variable_genes
- Return type
AnnData
- swnn.utils.process.label_binarize_each(labels, classes, sparse_out=True)¶
- swnn.utils.process.group_mean(X, labels, binary=False, classes=None, features=None, print_groups=True)¶
This function may work with more efficiency than df.groupby().mean() when handling sparse matrix.
- Parameters
X (shape (n_samples, n_features)) –
labels (shape (n_samples, )) –
classes (optional) – names of groups
features (optional) – names of features
print_groups (bool) – whether to inspect the groups
- swnn.utils.process.group_mean_dense(X, labels, binary=False, index_name='group', classes=None)¶
- swnn.utils.process.group_median_dense(X, labels, binary=False, index_name='group', classes=None)¶
- swnn.utils.process.group_mean_adata(adata, groupby, features=None, binary=False, use_raw=False)¶
Compute averaged feature-values for each group
- Parameters
adata (AnnData) –
groupby (str) – a column name in adata.obs
features – a subset of names in adata.var_names (or adata.raw.var_names)
binary (bool) – if True, the results will turn to be the non-zeor proportions for all (or the given) features
use_raw (bool) – whether to access adata.raw to compute the averages.
- Returns
- Return type
a pd.DataFrame with features as index and groups as columns