came.pipeline.preprocess_aligned

came.pipeline.preprocess_aligned(adatas: [<class 'anndata._core.anndata.AnnData'>, <class 'anndata._core.anndata.AnnData'>], key_class: str, df_varmap_1v1: ~pandas.core.frame.DataFrame | None = None, use_scnets: bool = True, n_pcs: int = 30, nneigh_scnet: int = 5, nneigh_clust: int = 20, deg_cuts: dict = {}, ntop_deg: int | None = 50, ntop_deg_nodes: int | None = 50, key_clust: str = 'clust_lbs', node_source: str = 'hvg,deg', ext_feats: ~typing.Sequence | None = None, ext_nodes: ~typing.Sequence | None = None)

Packed function for process adatas with aligned features (i.e., one-to-one correspondence).

Processing Steps:

  • align variables

  • preprocessing

  • candidate genes (HVGs and DEGs)

  • pre-clustering query data

  • computing single-cell network

Parameters:
  • adatas – A pair of sc.AnnData objects, the reference and query raw data

  • key_class – the key to the type-labels, should be a column name of adatas[0].obs

  • df_varmap_1v1 – dataframe containing only 1-to-1 correspondence between features in adatas; if not provided, map the variables of their original names.

  • use_scnets – whether to use the cell-cell-similarity edges (single-cell-network)

  • n_pcs – the number of PCs for computing the single-cell-network

  • nneigh_scnet – the number of nearest neighbors to account for the single-cell-network

  • nneigh_clust – the number of nearest neighbors to account for pre-clustering

  • deg_cuts – dict with keys ‘cut_padj’, ‘cut_pts’, and ‘cut_logfc’, used for filtering DEGs.

  • ntop_deg – the number of top DEGs to take as the node-features

  • ntop_deg_nodes – the number of top DEGs to take as the graph nodes

  • key_clust – where to add the per-clustering labels to the query data, i.e., adatas[1].obs. By default, it’s set as came.pipeline.KEY_CLUSTER

  • node_source – source of the node genes, using both DEGs and HVGs by default

  • ext_feats – extra variables (genes) to be added to the auto-selected ones as the observation(cell)-node features.

  • ext_nodes – extra variables (genes) to be added to the auto-selected ones as the variable(gene)-nodes.

Returns:

  • came_inputs (a dict containing CAME inputs)

  • (adata1, adata2) (a tuple of the preprocessed AnnData objects)