Load pre-computed results¶
Make sure that you have finished the model training process (CAME’s pipeline) and had the results properly stored.
[1]:
import os
from pathlib import Path
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import scanpy as sc
from scipy import sparse
from scipy.special import softmax
import networkx as nx
import torch
[2]:
import sys
sys.path.append('../')
import came
from came import pipeline, pp, pl
Using backend: pytorch
Load CAME results¶
Three main objects are included:
dpair
model
predictor
[3]:
# the result directory
came_resdir = Path("../_temp/('Baron_human', 'Baron_mouse')-(07-15 23.57.51)")
dpair, model = came.load_dpair_and_model(came_resdir)
predictor = came.Predictor.load(came_resdir / 'predictor.json')
[*] Setting dataset names:
0-->Baron_human
1-->Baron_mouse
[*] Setting aligned features for observation nodes (self._features)
[*] Setting un-aligned features (`self._ov_adjs`) for making links connecting observation and variable nodes
[*] Setting adjacent matrix connecting variables from these 2 datasets (`self._vv_adj`)
Index(['cell_ontology_class', 'cell_ontology_id', 'cell_type1', 'dataset_name',
'donor', 'latent_1', 'latent_10', 'latent_2', 'latent_3', 'latent_4',
'latent_5', 'latent_6', 'latent_7', 'latent_8', 'latent_9', 'library',
'organ', 'organism', 'platform', 'tSNE1', 'tSNE2'],
dtype='object')
Index(['cell_ontology_class', 'cell_ontology_id', 'cell_type1', 'dataset_name',
'donor', 'latent_1', 'latent_10', 'latent_2', 'latent_3', 'latent_4',
'latent_5', 'latent_6', 'latent_7', 'latent_8', 'latent_9', 'library',
'organ', 'organism', 'platform', 'tSNE1', 'tSNE2', 'clust_lbs'],
dtype='object')
-------------------- Summary of the DGL-Heterograph --------------------
Graph(num_nodes={'cell': 4028, 'gene': 6556},
num_edges={('cell', 'express', 'gene'): 1513823, ('cell', 'self_loop_cell', 'cell'): 4028, ('cell', 'similar_to', 'cell'): 25908, ('gene', 'expressed_by', 'cell'): 1513823, ('gene', 'homolog_with', 'gene'): 12462},
metagraph=[('cell', 'gene', 'express'), ('cell', 'cell', 'self_loop_cell'), ('cell', 'cell', 'similar_to'), ('gene', 'cell', 'expressed_by'), ('gene', 'gene', 'homolog_with')])
second-order connection: False
self-loops for observation-nodes: True
self-loops for variable-nodes: True
Common variables that can be used in the downstream analysis¶
Including:
The model inputs
the feature dict
the cell-gene heterogrnrous graph
reference and query sample-ids
reference classes (type space)
[4]:
# the feature dict
feat_dict = dpair.get_feature_dict(scale=True)
# the heterogrnrous cell-gene graph
g = dpair.get_whole_net()
# reference and query sample-ids
obs_ids1, obs_ids2 = dpair.obs_ids1, dpair.obs_ids2
# reference classes (type space)
classes = predictor.classes
Get cell-to-gene attentions¶
format: n_cells x n_genes CSR sparse matrix
[8]:
import came
attns = came.model.get_attentions(model, feat_dict, g)
[9]:
attns
[9]:
<4028x6556 sparse matrix of type '<class 'numpy.float32'>'
with 1513823 stored elements in Compressed Sparse Row format>