came.DataPair¶

class came.DataPair(features: Sequence[spmatrix | ndarray], ov_adjs: Sequence[spmatrix | ndarray], vv_adj: spmatrix, oo_adjs: Sequence[spmatrix] | None = None, varnames_feat: Sequence[str] | None = None, varnames_node: Sequence[str] | None = None, obs_dfs: Sequence[DataFrame] | None = None, var_dfs: Sequence[DataFrame] | None = None, dataset_names: Sequence[str] = ('reference', 'query'), ntypes: Dict[str, str] | None = None, etypes: Dict[str, str] | None = None, make_graph: bool = True, **kwds)¶

Paired datasets with the un-aligned features (e.g., cross-speceis)

Parameters:

features (list or tuple) – a list or tuple of 2 feature matrices. common / aligned features, as node-features (for observations). of shape (n_obs1, n_features) and (n_obs2, n_features)
ov_adjs (list or tuple) – a list or tuple of 2 (sparse) feature matrices. unaligned features, for making ov_adj. of shape (n_obs1, n_vnodes1) and (n_obs2, n_vnodes2)
vv_adj (scipy.sparse.spmatrix) – adjacent matrix between variables from these 2 datasets. (e.g. gene-gene adjacent matrix) of shape (n_vnodes, n_vnodes), where n_vnodes (= n_vnodes1 + n_vnodes2) is the total number of variable-nodes.
varnames_node (list or tuple) – a list or tuple of 2 name-lists, or one concatenated name-list. lengths should be n_vnodes1 and v_nodes2.
obs_dfs (list or tuple) – a list or tuple of 2 DataFrame s
ntypes (dict) – A dict for specifying names of the node types
etypes (dict) – A dict for specifying names of the edge types
**kwds – other key words for the HeteroGraph construction

Examples

>>> dpair = DataPair(
...     [features1, features2],
...     [ov_adj1, ov_adj2],
...     vv_adj = vv_adj,
...     varnames_node = [vnodes1, vnodes2],
...     obs_dfs = [df1, df2],
...     dataset_names = ('reference', 'query'),
...     )

__init__(features: Sequence[spmatrix | ndarray], ov_adjs: Sequence[spmatrix | ndarray], vv_adj: spmatrix, oo_adjs: Sequence[spmatrix] | None = None, varnames_feat: Sequence[str] | None = None, varnames_node: Sequence[str] | None = None, obs_dfs: Sequence[DataFrame] | None = None, var_dfs: Sequence[DataFrame] | None = None, dataset_names: Sequence[str] = ('reference', 'query'), ntypes: Dict[str, str] | None = None, etypes: Dict[str, str] | None = None, make_graph: bool = True, **kwds)¶

Methods

`__init__`(features, ov_adjs, vv_adj[, ...])
`get_feature_dict`([astensor, scale, unit_var])
`get_obs_anno`(keys[, which, concat])	get the annotations of samples (observations)
`get_obs_dataset`()	Get the dataset-identities for the observations
`get_obs_features`([astensor, scale, ...])
`get_obs_ids`([which, astensor])	get node indices for obs-nodes (samples) choices are: 1. all the node ids (by which=None) 2. only the ids of the "reference" data (by which=0) 3. only the ids of the "query" data (by which=1).
`get_obs_labels`(keys[, astensor, train_use, ...])	make labels for model training
`get_vnode_ids`([which, astensor])	get node indices for var-nodes, choices are:
`get_vnode_ids_by_name`(varlist[, which, ...])	looking-up var-node indices for the given names
`get_vnode_names`([vnode_ids, tolist])
`get_whole_net`([rebuild])
`load`(fp)	load object fp: file path to `DataPair` object, e.g., 'datapair_init.pickle'
`make_ov_adj`([link2ord])	observation-variable bipartite network
`make_whole_net`([link2ord, selfloop_o, ...])	make the whole hetero-graph (e.g.
`save_init`([path])	save object for reloading
`set_common_obs_annos`([df, ignore_index])	Shared and merged annotation labels for ALL of the observations in both datasets.
`set_dataset_names`(dataset_names)
`set_etypes`(etypes)
`set_features`(features[, varnames_feat])	setting feature matrices, where features are aligned across datasets. varnames_feat: if provided, should be a sequence of two name-lists.
`set_ntypes`(ntypes)
`set_obs_dfs`([obs1, obs2])	Set private observation annotations (should be done AFTER running `self.set_features(..)`)
`set_oo_adj`([oo_adjs])
`set_ov_adj`(ov_adjs)	set un-aligned features, for making observation-variable adjacent matrix
`set_var_dfs`(var1, var2)
`set_vnode_annos`([df, ignore_index, force_reset])
`set_vv_adj`(vv_adj[, varnames_node])	vv_adj:
`summary_graph`()

Attributes

`G`	The graph structure, of type `dgl.Heterograph`
`classes`	Unique classes (types) in the reference data, may contain "unknown" if there are any types in the query data but not in the reference, or if the query data is un-labeled.
`etypes`
`feat_names1`	Feature ames of the observation (e.g., cell) nodes in the reference data
`feat_names2`	Feature ames of the observation (e.g., cell) nodes in the query data
`labels`	Labels for each observations that would be taken as the supervised information for model-training.
`n_feats`	Number of dimensions of the observation-node features
`n_obs`	Total number of the observations (e.g., cells)
`n_obs1`	Number of observations (e.g., cells) in the reference data
`n_obs2`	Number of observations (e.g., cells) in the query data
`n_vnodes`	Total number of the variables (e.g., genes)
`n_vnodes1`	Number of variables (e.g., genes) in the reference data
`n_vnodes2`	Number of variables (e.g., genes) in the query data
`ntypes`
`obs_ids`	All of the observation (e.g., cell) indices
`obs_ids1`	Indices of the observation (e.g., cell) nodes in the reference data
`obs_ids2`	Indices of the observation (e.g., cell) nodes in the query data
`oo_adj`	observation-by-variable adjacent matrix (e.g.
`ov_adj`	merged adjacent matrix between observation and variable nodes (e.g.
`var_ids1`	Indices of the variable (e.g., gene) nodes in the reference data
`var_ids2`	Indices of the variable (e.g., gene) nodes in the query data
`vnode_names1`	Names of the variable (e.g., gene) nodes in the reference data
`vnode_names2`	Names of the variable (e.g., gene) nodes in the query data
`vv_adj`	var-var adjacent matrix (e.g.