came.pipeline.main_for_aligned

came.pipeline.main_for_aligned(adatas: Sequence[AnnData], vars_feat: Sequence, vars_as_nodes: Sequence | None = None, scnets: Sequence[spmatrix] | None = None, dataset_names: Sequence[str] = ('reference', 'query'), key_class1: str = 'cell_ontology_class', key_class2: str | None = None, do_normalize: bool | Sequence[bool] = True, batch_keys=None, n_epochs: int = 350, resdir: Path | str | None = None, tag_data: str | None = None, params_model: dict = {}, params_lossfunc: dict = {}, n_pass: int = 100, batch_size: int | None = None, pred_batch_size: int | str | None = 'auto', plot_results: bool = False, norm_target_sum: float | None = None, save_hidden_list: bool = True, save_dpair: bool = True)

Run the main process of CAME (model training), for integrating 2 datasets of aligned features. (e.g., cross-species integration)

Parameters:
  • adatas – A pair of sc.AnnData objects, the reference and query raw data

  • vars_feat (a sequence of strings) – variables to be taken as the node-features of the observations

  • vars_as_nodes (a sequence of strings) – variables to be taken as the graph nodes

  • scnets – two single-cell-networks or a merged one

  • dataset_names – a tuple of two names for reference and query, respectively

  • key_class1 – the key to the type-labels for the reference data, should be a column name of adatas[0].obs.

  • key_class2 – the key to the type-labels for the query data. Optional, if provided, should be a column name of adatas[1].obs.

  • do_normalize – whether to normalize the input data (the they have already been normalized, set it False)

  • batch_keys – a list of two strings (or None), specifying the batch-keys for data1 and data2, respectively. if given, features (of cell nodes) will be scaled within each batch.

  • n_epochs – number of training epochs. A recommended setting is 200-400 for whole-graph training, and 80-200 for sub-graph training.

  • resdir – directory for saving results output by CAME

  • tag_data – a tag for auto-creating result directory

  • params_model – the model parameters

  • params_lossfunc – parameters for loss function

  • n_pass – number of epochs to skip; not backup model checkpoints until n_pass epochs.

  • batch_size – the number of observation nodes in each mini-batch, based on which the sub-graphs will be used for mini-batch training. if None, the model will be trained on the whole graph.

  • pred_batch_size – batch-size in prediction process

  • plot_results – whether to automatically plot the classification results

  • norm_target_sum – the scale factor for library-size normalization

  • save_hidden_list – whether to save the hidden states for all the layers

  • save_dpair – whether to save the elements of the DataPair object

Returns:

outputs

Return type:

dict