came.pipeline.preprocess_aligned¶

came.pipeline.preprocess_aligned(adatas: [<class 'anndata._core.anndata.AnnData'>, <class 'anndata._core.anndata.AnnData'>], key_class: str, df_varmap_1v1: ~pandas.core.frame.DataFrame | None = None, use_scnets: bool = True, n_pcs: int = 30, nneigh_scnet: int = 5, nneigh_clust: int = 20, deg_cuts: dict = {}, ntop_deg: int | None = 50, ntop_deg_nodes: int | None = 50, key_clust: str = 'clust_lbs', node_source: str = 'hvg,deg', ext_feats: ~typing.Sequence | None = None, ext_nodes: ~typing.Sequence | None = None)¶

Packed function for process adatas with aligned features (i.e., one-to-one correspondence).

Processing Steps:

align variables

preprocessing

candidate genes (HVGs and DEGs)

pre-clustering query data

computing single-cell network

Parameters:

adatas – A pair of sc.AnnData objects, the reference and query raw data
key_class – the key to the type-labels, should be a column name of adatas[0].obs
df_varmap_1v1 – dataframe containing only 1-to-1 correspondence between features in adatas; if not provided, map the variables of their original names.
use_scnets – whether to use the cell-cell-similarity edges (single-cell-network)
n_pcs – the number of PCs for computing the single-cell-network
nneigh_scnet – the number of nearest neighbors to account for the single-cell-network
nneigh_clust – the number of nearest neighbors to account for pre-clustering
deg_cuts – dict with keys ‘cut_padj’, ‘cut_pts’, and ‘cut_logfc’, used for filtering DEGs.
ntop_deg – the number of top DEGs to take as the node-features
ntop_deg_nodes – the number of top DEGs to take as the graph nodes
key_clust – where to add the per-clustering labels to the query data, i.e., adatas[1].obs. By default, it’s set as came.pipeline.KEY_CLUSTER
node_source – source of the node genes, using both DEGs and HVGs by default
ext_feats – extra variables (genes) to be added to the auto-selected ones as the observation(cell)-node features.
ext_nodes – extra variables (genes) to be added to the auto-selected ones as the variable(gene)-nodes.

Returns:

came_inputs (a dict containing CAME inputs)
(adata1, adata2) (a tuple of the preprocessed AnnData objects)