came.pipeline.main_for_unaligned¶
- came.pipeline.main_for_unaligned(adatas: Sequence[AnnData], vars_feat: Sequence[Sequence], vars_as_nodes: Sequence[Sequence], df_varmap: DataFrame, df_varmap_1v1: DataFrame | None = None, scnets: Sequence[spmatrix] | None = None, union_var_nodes: bool = True, union_node_feats: bool = True, keep_non1v1_feats: bool = False, col_weight: str | None = None, non1v1_trans_to: int = 0, dataset_names: Sequence[str] = ('reference', 'query'), key_class1: str = 'cell_ontology_class', key_class2: str | None = None, do_normalize: bool = True, batch_keys=None, n_epochs: int = 350, resdir: Path | str | None = None, tag_data: str | None = None, params_model: dict = {}, params_lossfunc: dict = {}, n_pass: int = 100, batch_size: int | None = None, pred_batch_size: int | str | None = 'auto', plot_results: bool = False, norm_target_sum: float | None = None, save_hidden_list: bool = True, save_dpair: bool = True)¶
Run the main process of CAME (model training), for integrating 2 datasets of unaligned features. (e.g., cross-species integration)
- Parameters:
adatas – A pair of
sc.AnnData
objects, the reference and query raw datavars_feat – A list or tuple of 2 variable name-lists. for example, differential expressed genes, highly variable features.
vars_as_nodes (list or tuple of 2) – variables to be taken as the graph nodes
df_varmap – A
pd.DataFrame
with (at least) 2 columns; required. relationships between features in 2 datasets, for making the adjacent matrix (vv_adj) between variables from these 2 datasets.df_varmap_1v1 (None, pd.DataFrame; optional.) – dataframe containing only 1-to-1 correspondence between features in 2 datasets, if not provided, it will be inferred from df_varmap
scnets – two single-cell-networks or a merged one
union_var_nodes (bool) – whether to take the union of the variable-nodes
union_node_feats (bool) – whether to take the union of the observation(cell)-node features
keep_non1v1_feats (bool) – whether to take into account the non-1v1 variables as the node features. If most of the homologies are non-1v1, better set this as True!
col_weight – A column in
df_varmap
specifying the weights between homologies.non1v1_trans_to (int) – the direction to transform non-1v1 features, should either be 0 or 1. Set as 0 to transform query data to the reference (default), 1 to transform the reference data to the query. If set
keep_non1v1_feats=False
, this parameter will be ignored.dataset_names – a tuple of two names for reference and query, respectively
key_class1 – the key to the type-labels for the reference data, should be a column name of
adatas[0].obs
.key_class2 – the key to the type-labels for the query data. Optional, if provided, should be a column name of
adatas[1].obs
.do_normalize – whether to normalize the input data (the they have already been normalized, set it False)
batch_keys – a list of two strings (or None), specifying the batch-keys for data1 and data2, respectively. if given, features (of cell nodes) will be scaled within each batch
n_epochs – number of training epochs. A recommended setting is 200-400 for whole-graph training, and 80-200 for sub-graph training.
resdir – directory for saving results output by CAME
tag_data – a tag for auto-creating the result directory
resdir
params_model – the model parameters
params_lossfunc – parameters for loss function
n_pass – number of epochs to skip; not backup model checkpoints until
n_pass
epochs.batch_size – the number of observation nodes in each mini-batch, based on which the sub-graphs will be used for mini-batch training. if None, the model will be trained on the whole graph.
pred_batch_size – batch-size in prediction process
plot_results – whether to automatically plot the classification results
norm_target_sum – the scale factor for library-size normalization
save_hidden_list – whether to save the hidden states for all the layers
save_dpair – whether to save the elements of the DataPair object
- Returns:
outputs
- Return type:
dict