came.AlignedDataPair

class came.AlignedDataPair(features: Sequence[spmatrix | ndarray], ov_adjs: Sequence[spmatrix | ndarray], oo_adjs: Sequence[spmatrix] | None = None, varnames_feat: Sequence[str] | None = None, varnames_node: Sequence[str] | None = None, obs_dfs: Sequence[DataFrame] | None = None, var_dfs: Sequence[DataFrame] | None = None, dataset_names: Sequence[str] = ('reference', 'query'), ntypes: Dict[str, str] | None = None, etypes: Dict[str, str] | None = None, make_graph: bool = True, **kwds)

Paired datasets with the aligned features (e.g. cross-datasets or cross-omics)

Parameters:
  • features (list or tuple) – a list or tuple of 2 feature matrices. common / aligned feratures, as node-features (for observations). of shape (n_obs1, n_features) and (n_obs2, n_features)

  • ov_adjs (list or tuple) – a list or tuple of 2 (sparse) feature matrices. unaligned features, for making ov_adj. of shape (n_obs1, n_vnodes1) and (n_obs2, n_vnodes2)

  • varnames_feat (list or tuple) – names of variables that will be treated as node-features for observations

  • varnames_node (list or tuple) – names of variables that will be treated as nodes.

  • obs_dfs (list or tuple) – a list or tuple of 2 DataFrame s

  • ntypes (dict) – A dict for specifying names of the node types

  • etypes (dict) – A dict for specifying names of the edge types

  • **kwds – other key words for the HeteroGraph construction

Examples

>>> dpair = AlignedDataPair(
...     [features1, features2],
...     [ov_adj1, ov_adj2],
...     varnames_feat = vars_feat,
...     varnames_node = vars_node,
...     obs_dfs = [obs1, obs2],
...     dataset_names=dataset_names,
...     )
__init__(features: Sequence[spmatrix | ndarray], ov_adjs: Sequence[spmatrix | ndarray], oo_adjs: Sequence[spmatrix] | None = None, varnames_feat: Sequence[str] | None = None, varnames_node: Sequence[str] | None = None, obs_dfs: Sequence[DataFrame] | None = None, var_dfs: Sequence[DataFrame] | None = None, dataset_names: Sequence[str] = ('reference', 'query'), ntypes: Dict[str, str] | None = None, etypes: Dict[str, str] | None = None, make_graph: bool = True, **kwds)

Methods

__init__(features, ov_adjs[, oo_adjs, ...])

get_feature_dict([astensor, scale, unit_var])

get_obs_anno(keys[, which, concat])

get the annotations of samples (observations)

get_obs_dataset()

get_obs_features([astensor, scale, ...])

get_obs_ids([which, astensor])

get node indices for obs-nodes (samples), choices are:

get_obs_labels(keys[, astensor, train_use, ...])

make labels for model training

get_whole_net([rebuild])

load(fp)

load object fp: file path to AlignedDataPair object, e.g., 'datapair_init.pickle'

make_ov_adj()

make_whole_net([selfloop_o, selfloop_v])

make the whole hetero-graph (e.g.

save_init([path])

save object for reloading

set_common_obs_annos([df, ignore_index])

Shared and merged annotation labels for ALL of the observations in both datasets.

set_dataset_names(dataset_names)

set_etypes(etypes)

set_features(features[, varnames_feat])

setting feature matrices, where features are aligned across datasets.

set_ntypes(ntypes)

set_obs_dfs([obs1, obs2])

set_oo_adj([oo_adjs])

set_ov_adj(ov_adjs)

set observation-by-variable adjacent matrices

set_varnames_node([varnames_node, index])

summary_graph()

Attributes

G

The graph structure, of type dgl.Heterograph

classes

Unique classes (types) in the reference data, may contain "unknown" if there are any types in the query data but not in the reference, or if the query data is un-labeled.

etypes

labels

Labels for each observations that would be taken as the supervised information for model-training.

n_feats

Number of dimensions of the observation-node features

n_obs

Total number of the observations (e.g., cells)

n_obs1

Number of observations (e.g., cells) in the reference data

n_obs2

Number of observations (e.g., cells) in the query data

n_vnodes

Total number of the variables (e.g., genes)

ntypes

obs_ids

All of the observation (e.g., cell) indices

obs_ids1

Indices of the observation (e.g., cell) nodes in the reference data

obs_ids2

Indices of the observation (e.g., cell) nodes in the query data

oo_adj

observation-by-variable adjacent matrix (e.g.

ov_adj

merged adjacent matrix between observation and variable nodes (e.g.

varnames_feat

The observation feature names

varnames_node

Names of variable nodes

vnode_ids

All of the variable (e.g., cell) indices