came.pipeline.preprocess_unaligned

came.pipeline.preprocess_unaligned(adatas: [<class 'anndata._core.anndata.AnnData'>, <class 'anndata._core.anndata.AnnData'>], key_class: str, use_scnets: bool = True, n_pcs: int = 30, nneigh_scnet: int = 5, nneigh_clust: int = 20, deg_cuts: dict = {}, ntop_deg: int | None = 50, ntop_deg_nodes: int | None = 50, key_clust: str = 'clust_lbs', node_source: str = 'hvg,deg', ext_feats: ~typing.Sequence[~typing.Sequence] | None = None, ext_nodes: ~typing.Sequence[~typing.Sequence] | None = None)

Packed function for process adatas with un-aligned features. (i.e., some of them could be one-to-many or many-to-one correspondence)

Processing Steps:

  • preprocessing

  • candidate genes (HVGs and DEGs)

  • pre-clustering query data

  • computing single-cell network

Parameters:
  • adatas – A pair of sc.AnnData objects, the reference and query raw data

  • key_class – the key to the type-labels, should be a column name of adatas[0].obs

  • use_scnets – whether to use the cell-cell-similarity edges (single-cell-network)

  • n_pcs – the number of PCs for computing the single-cell-network

  • nneigh_scnet – the number of nearest neighbors to account for the single-cell-network

  • nneigh_clust – the number of nearest neighbors to account for pre-clustering

  • deg_cuts – dict with keys ‘cut_padj’, ‘cut_pts’, and ‘cut_logfc’

  • ntop_deg – the number of top DEGs to take as the node-features

  • ntop_deg_nodes – the number of top DEGs to take as the graph nodes, which can be directly displayed on the UMAP plot.

  • key_clust – where to add the per-clustering labels to the query data, i.e., adatas[1].obs

  • node_source – source of the node genes, using both DEGs and HVGs by default

  • ext_feats

    A tuple of two lists of variable names. Extra variables (genes) to be added to the auto-selected ones as the

    observation(cell)-node features.

  • ext_nodes – A tuple of two lists of variable names. Extra variables (genes) to be added to the auto-selected ones as the variable(gene)-nodes.

Returns:

  • came_inputs (a dict containing CAME inputs)

  • (adata1, adata2) (a tuple of the preprocessed AnnData objects)