came.pipeline.main_for_aligned¶

came.pipeline.main_for_aligned(adatas: Sequence[AnnData], vars_feat: Sequence, vars_as_nodes: Sequence | None = None, scnets: Sequence[spmatrix] | None = None, dataset_names: Sequence[str] = ('reference', 'query'), key_class1: str = 'cell_ontology_class', key_class2: str | None = None, do_normalize: bool | Sequence[bool] = True, batch_keys=None, n_epochs: int = 350, resdir: Path | str | None = None, tag_data: str | None = None, params_model: dict = {}, params_lossfunc: dict = {}, n_pass: int = 100, batch_size: int | None = None, pred_batch_size: int | str | None = 'auto', plot_results: bool = False, norm_target_sum: float | None = None, save_hidden_list: bool = True, save_dpair: bool = True)¶

Run the main process of CAME (model training), for integrating 2 datasets of aligned features. (e.g., cross-species integration)

Parameters:

adatas – A pair of sc.AnnData objects, the reference and query raw data
vars_feat (a sequence of strings) – variables to be taken as the node-features of the observations
vars_as_nodes (a sequence of strings) – variables to be taken as the graph nodes
scnets – two single-cell-networks or a merged one
dataset_names – a tuple of two names for reference and query, respectively
key_class1 – the key to the type-labels for the reference data, should be a column name of adatas[0].obs.
key_class2 – the key to the type-labels for the query data. Optional, if provided, should be a column name of adatas[1].obs.
do_normalize – whether to normalize the input data (the they have already been normalized, set it False)
batch_keys – a list of two strings (or None), specifying the batch-keys for data1 and data2, respectively. if given, features (of cell nodes) will be scaled within each batch.
n_epochs – number of training epochs. A recommended setting is 200-400 for whole-graph training, and 80-200 for sub-graph training.
resdir – directory for saving results output by CAME
tag_data – a tag for auto-creating result directory
params_model – the model parameters
params_lossfunc – parameters for loss function
n_pass – number of epochs to skip; not backup model checkpoints until n_pass epochs.
batch_size – the number of observation nodes in each mini-batch, based on which the sub-graphs will be used for mini-batch training. if None, the model will be trained on the whole graph.
pred_batch_size – batch-size in prediction process
plot_results – whether to automatically plot the classification results
norm_target_sum – the scale factor for library-size normalization
save_hidden_list – whether to save the hidden states for all the layers
save_dpair – whether to save the elements of the DataPair object

Returns:

outputs

Return type:

dict