came.pipeline.preprocess_unaligned¶

came.pipeline.preprocess_unaligned(adatas: [<class 'anndata._core.anndata.AnnData'>, <class 'anndata._core.anndata.AnnData'>], key_class: str, use_scnets: bool = True, n_pcs: int = 30, nneigh_scnet: int = 5, nneigh_clust: int = 20, deg_cuts: dict = {}, ntop_deg: int | None = 50, ntop_deg_nodes: int | None = 50, key_clust: str = 'clust_lbs', node_source: str = 'hvg,deg', ext_feats: ~typing.Sequence[~typing.Sequence] | None = None, ext_nodes: ~typing.Sequence[~typing.Sequence] | None = None)¶

Packed function for process adatas with un-aligned features. (i.e., some of them could be one-to-many or many-to-one correspondence)

Processing Steps:

preprocessing

candidate genes (HVGs and DEGs)

pre-clustering query data

computing single-cell network

Parameters:

adatas – A pair of sc.AnnData objects, the reference and query raw data
key_class – the key to the type-labels, should be a column name of adatas[0].obs
use_scnets – whether to use the cell-cell-similarity edges (single-cell-network)
n_pcs – the number of PCs for computing the single-cell-network
nneigh_scnet – the number of nearest neighbors to account for the single-cell-network
nneigh_clust – the number of nearest neighbors to account for pre-clustering
deg_cuts – dict with keys ‘cut_padj’, ‘cut_pts’, and ‘cut_logfc’
ntop_deg – the number of top DEGs to take as the node-features
ntop_deg_nodes – the number of top DEGs to take as the graph nodes, which can be directly displayed on the UMAP plot.
key_clust – where to add the per-clustering labels to the query data, i.e., adatas[1].obs
node_source – source of the node genes, using both DEGs and HVGs by default
ext_feats –
A tuple of two lists of variable names. Extra variables (genes) to be added to the auto-selected ones as the

observation(cell)-node features.
ext_nodes – A tuple of two lists of variable names. Extra variables (genes) to be added to the auto-selected ones as the variable(gene)-nodes.

Returns:

came_inputs (a dict containing CAME inputs)
(adata1, adata2) (a tuple of the preprocessed AnnData objects)