came.DataPair¶
- class came.DataPair(features: Sequence[spmatrix | ndarray], ov_adjs: Sequence[spmatrix | ndarray], vv_adj: spmatrix, oo_adjs: Sequence[spmatrix] | None = None, varnames_feat: Sequence[str] | None = None, varnames_node: Sequence[str] | None = None, obs_dfs: Sequence[DataFrame] | None = None, var_dfs: Sequence[DataFrame] | None = None, dataset_names: Sequence[str] = ('reference', 'query'), ntypes: Dict[str, str] | None = None, etypes: Dict[str, str] | None = None, make_graph: bool = True, **kwds)¶
Paired datasets with the un-aligned features (e.g., cross-speceis)
- Parameters:
features (list or tuple) – a list or tuple of 2 feature matrices. common / aligned features, as node-features (for observations). of shape (n_obs1, n_features) and (n_obs2, n_features)
ov_adjs (list or tuple) – a list or tuple of 2 (sparse) feature matrices. unaligned features, for making ov_adj. of shape (n_obs1, n_vnodes1) and (n_obs2, n_vnodes2)
vv_adj (scipy.sparse.spmatrix) – adjacent matrix between variables from these 2 datasets. (e.g. gene-gene adjacent matrix) of shape (n_vnodes, n_vnodes), where n_vnodes (= n_vnodes1 + n_vnodes2) is the total number of variable-nodes.
varnames_node (list or tuple) – a list or tuple of 2 name-lists, or one concatenated name-list. lengths should be n_vnodes1 and v_nodes2.
obs_dfs (list or tuple) – a list or tuple of 2
DataFrame
sntypes (dict) – A dict for specifying names of the node types
etypes (dict) – A dict for specifying names of the edge types
**kwds – other key words for the HeteroGraph construction
Examples
>>> dpair = DataPair( ... [features1, features2], ... [ov_adj1, ov_adj2], ... vv_adj = vv_adj, ... varnames_node = [vnodes1, vnodes2], ... obs_dfs = [df1, df2], ... dataset_names = ('reference', 'query'), ... )
- __init__(features: Sequence[spmatrix | ndarray], ov_adjs: Sequence[spmatrix | ndarray], vv_adj: spmatrix, oo_adjs: Sequence[spmatrix] | None = None, varnames_feat: Sequence[str] | None = None, varnames_node: Sequence[str] | None = None, obs_dfs: Sequence[DataFrame] | None = None, var_dfs: Sequence[DataFrame] | None = None, dataset_names: Sequence[str] = ('reference', 'query'), ntypes: Dict[str, str] | None = None, etypes: Dict[str, str] | None = None, make_graph: bool = True, **kwds)¶
Methods
__init__
(features, ov_adjs, vv_adj[, ...])get_feature_dict
([astensor, scale, unit_var])get_obs_anno
(keys[, which, concat])get the annotations of samples (observations)
get_obs_dataset
()Get the dataset-identities for the observations
get_obs_features
([astensor, scale, ...])get_obs_ids
([which, astensor])get node indices for obs-nodes (samples) choices are: 1. all the node ids (by which=None) 2. only the ids of the "reference" data (by which=0) 3. only the ids of the "query" data (by which=1).
get_obs_labels
(keys[, astensor, train_use, ...])make labels for model training
get_vnode_ids
([which, astensor])get node indices for var-nodes, choices are:
get_vnode_ids_by_name
(varlist[, which, ...])looking-up var-node indices for the given names
get_vnode_names
([vnode_ids, tolist])get_whole_net
([rebuild])load
(fp)load object fp: file path to
DataPair
object, e.g., 'datapair_init.pickle'make_ov_adj
([link2ord])observation-variable bipartite network
make_whole_net
([link2ord, selfloop_o, ...])make the whole hetero-graph (e.g.
save_init
([path])save object for reloading
set_common_obs_annos
([df, ignore_index])Shared and merged annotation labels for ALL of the observations in both datasets.
set_dataset_names
(dataset_names)set_etypes
(etypes)set_features
(features[, varnames_feat])setting feature matrices, where features are aligned across datasets. varnames_feat: if provided, should be a sequence of two name-lists.
set_ntypes
(ntypes)set_obs_dfs
([obs1, obs2])Set private observation annotations (should be done AFTER running
self.set_features(..)
)set_oo_adj
([oo_adjs])set_ov_adj
(ov_adjs)set un-aligned features, for making observation-variable adjacent matrix
set_var_dfs
(var1, var2)set_vnode_annos
([df, ignore_index, force_reset])set_vv_adj
(vv_adj[, varnames_node])vv_adj:
summary_graph
()Attributes
G
The graph structure, of type
dgl.Heterograph
classes
Unique classes (types) in the reference data, may contain "unknown" if there are any types in the query data but not in the reference, or if the query data is un-labeled.
etypes
feat_names1
Feature ames of the observation (e.g., cell) nodes in the reference data
feat_names2
Feature ames of the observation (e.g., cell) nodes in the query data
labels
Labels for each observations that would be taken as the supervised information for model-training.
n_feats
Number of dimensions of the observation-node features
n_obs
Total number of the observations (e.g., cells)
n_obs1
Number of observations (e.g., cells) in the reference data
n_obs2
Number of observations (e.g., cells) in the query data
n_vnodes
Total number of the variables (e.g., genes)
n_vnodes1
Number of variables (e.g., genes) in the reference data
n_vnodes2
Number of variables (e.g., genes) in the query data
ntypes
obs_ids
All of the observation (e.g., cell) indices
obs_ids1
Indices of the observation (e.g., cell) nodes in the reference data
obs_ids2
Indices of the observation (e.g., cell) nodes in the query data
oo_adj
observation-by-variable adjacent matrix (e.g.
ov_adj
merged adjacent matrix between observation and variable nodes (e.g.
var_ids1
Indices of the variable (e.g., gene) nodes in the reference data
var_ids2
Indices of the variable (e.g., gene) nodes in the query data
vnode_names1
Names of the variable (e.g., gene) nodes in the reference data
vnode_names2
Names of the variable (e.g., gene) nodes in the query data
vv_adj
var-var adjacent matrix (e.g.