came.pipeline.main_for_unaligned

came.pipeline.main_for_unaligned(adatas: Sequence[AnnData], vars_feat: Sequence[Sequence], vars_as_nodes: Sequence[Sequence], df_varmap: DataFrame, df_varmap_1v1: DataFrame | None = None, scnets: Sequence[spmatrix] | None = None, union_var_nodes: bool = True, union_node_feats: bool = True, keep_non1v1_feats: bool = False, col_weight: str | None = None, non1v1_trans_to: int = 0, dataset_names: Sequence[str] = ('reference', 'query'), key_class1: str = 'cell_ontology_class', key_class2: str | None = None, do_normalize: bool = True, batch_keys=None, n_epochs: int = 350, resdir: Path | str | None = None, tag_data: str | None = None, params_model: dict = {}, params_lossfunc: dict = {}, n_pass: int = 100, batch_size: int | None = None, pred_batch_size: int | str | None = 'auto', plot_results: bool = False, norm_target_sum: float | None = None, save_hidden_list: bool = True, save_dpair: bool = True)

Run the main process of CAME (model training), for integrating 2 datasets of unaligned features. (e.g., cross-species integration)

Parameters:
  • adatas – A pair of sc.AnnData objects, the reference and query raw data

  • vars_feat – A list or tuple of 2 variable name-lists. for example, differential expressed genes, highly variable features.

  • vars_as_nodes (list or tuple of 2) – variables to be taken as the graph nodes

  • df_varmap – A pd.DataFrame with (at least) 2 columns; required. relationships between features in 2 datasets, for making the adjacent matrix (vv_adj) between variables from these 2 datasets.

  • df_varmap_1v1 (None, pd.DataFrame; optional.) – dataframe containing only 1-to-1 correspondence between features in 2 datasets, if not provided, it will be inferred from df_varmap

  • scnets – two single-cell-networks or a merged one

  • union_var_nodes (bool) – whether to take the union of the variable-nodes

  • union_node_feats (bool) – whether to take the union of the observation(cell)-node features

  • keep_non1v1_feats (bool) – whether to take into account the non-1v1 variables as the node features. If most of the homologies are non-1v1, better set this as True!

  • col_weight – A column in df_varmap specifying the weights between homologies.

  • non1v1_trans_to (int) – the direction to transform non-1v1 features, should either be 0 or 1. Set as 0 to transform query data to the reference (default), 1 to transform the reference data to the query. If set keep_non1v1_feats=False, this parameter will be ignored.

  • dataset_names – a tuple of two names for reference and query, respectively

  • key_class1 – the key to the type-labels for the reference data, should be a column name of adatas[0].obs.

  • key_class2 – the key to the type-labels for the query data. Optional, if provided, should be a column name of adatas[1].obs.

  • do_normalize – whether to normalize the input data (the they have already been normalized, set it False)

  • batch_keys – a list of two strings (or None), specifying the batch-keys for data1 and data2, respectively. if given, features (of cell nodes) will be scaled within each batch

  • n_epochs – number of training epochs. A recommended setting is 200-400 for whole-graph training, and 80-200 for sub-graph training.

  • resdir – directory for saving results output by CAME

  • tag_data – a tag for auto-creating the result directory resdir

  • params_model – the model parameters

  • params_lossfunc – parameters for loss function

  • n_pass – number of epochs to skip; not backup model checkpoints until n_pass epochs.

  • batch_size – the number of observation nodes in each mini-batch, based on which the sub-graphs will be used for mini-batch training. if None, the model will be trained on the whole graph.

  • pred_batch_size – batch-size in prediction process

  • plot_results – whether to automatically plot the classification results

  • norm_target_sum – the scale factor for library-size normalization

  • save_hidden_list – whether to save the hidden states for all the layers

  • save_dpair – whether to save the elements of the DataPair object

Returns:

outputs

Return type:

dict