Load pre-computed results

Make sure that you have finished the model training process (CAME’s pipeline) and had the results properly stored.

[1]:
import os
from pathlib import Path
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

import scanpy as sc
from scipy import sparse
from scipy.special import softmax

import networkx as nx
import torch
[2]:
import sys
sys.path.append('../')

import came
from came import pipeline, pp, pl
Using backend: pytorch

Load CAME results

Three main objects are included:

  • dpair

  • model

  • predictor

[3]:
# the result directory
came_resdir = Path("../_temp/('Baron_human', 'Baron_mouse')-(07-15 23.57.51)")

dpair, model = came.load_dpair_and_model(came_resdir)
predictor = came.Predictor.load(came_resdir / 'predictor.json')
[*] Setting dataset names:
        0-->Baron_human
        1-->Baron_mouse
[*] Setting aligned features for observation nodes (self._features)
[*] Setting un-aligned features (`self._ov_adjs`) for making links connecting observation and variable nodes
[*] Setting adjacent matrix connecting variables from these 2 datasets (`self._vv_adj`)
Index(['cell_ontology_class', 'cell_ontology_id', 'cell_type1', 'dataset_name',
       'donor', 'latent_1', 'latent_10', 'latent_2', 'latent_3', 'latent_4',
       'latent_5', 'latent_6', 'latent_7', 'latent_8', 'latent_9', 'library',
       'organ', 'organism', 'platform', 'tSNE1', 'tSNE2'],
      dtype='object')
Index(['cell_ontology_class', 'cell_ontology_id', 'cell_type1', 'dataset_name',
       'donor', 'latent_1', 'latent_10', 'latent_2', 'latent_3', 'latent_4',
       'latent_5', 'latent_6', 'latent_7', 'latent_8', 'latent_9', 'library',
       'organ', 'organism', 'platform', 'tSNE1', 'tSNE2', 'clust_lbs'],
      dtype='object')
-------------------- Summary of the DGL-Heterograph --------------------
Graph(num_nodes={'cell': 4028, 'gene': 6556},
      num_edges={('cell', 'express', 'gene'): 1513823, ('cell', 'self_loop_cell', 'cell'): 4028, ('cell', 'similar_to', 'cell'): 25908, ('gene', 'expressed_by', 'cell'): 1513823, ('gene', 'homolog_with', 'gene'): 12462},
      metagraph=[('cell', 'gene', 'express'), ('cell', 'cell', 'self_loop_cell'), ('cell', 'cell', 'similar_to'), ('gene', 'cell', 'expressed_by'), ('gene', 'gene', 'homolog_with')])
second-order connection: False
self-loops for observation-nodes: True
self-loops for variable-nodes: True

Common variables that can be used in the downstream analysis

Including:

  • The model inputs

    • the feature dict

    • the cell-gene heterogrnrous graph

  • reference and query sample-ids

  • reference classes (type space)

[4]:
# the feature dict
feat_dict = dpair.get_feature_dict(scale=True)

# the heterogrnrous cell-gene graph
g = dpair.get_whole_net()

# reference and query sample-ids
obs_ids1, obs_ids2 = dpair.obs_ids1, dpair.obs_ids2

# reference classes (type space)
classes = predictor.classes

Get hidden states

The hidden states are saved as format like:

[dict0, dict1, dict2]

where dict_i is a dict with ‘cell’ and ‘gene’ as keys, and the corresponding hidden state matrix as the values.

[5]:
# Load all hidden states (saved during CAME's pipeline)
hidden_list = came.load_hidden_states(resdir / 'hidden_list.h5')
len(hidden_list)
[5]:
3

The cell hidden-states

The defaults are with reference and query concatenated

[6]:
# with reference and query concatenated

# the embedding layer
embed_cell = hidden_list[0]['cell']

# the first hidden layer
h1_cell = hidden_list[1]['cell']

# the second hidden layer
h2_cell = hidden_list[2]['cell']

# separate reference and query
h2_cell1 = h2_cell[obs_ids1]
h2_cell2 = h2_cell[obs_ids2]

print(f"h2_cell1.shape={h2_cell1.shape}")
print(f"h2_cell2.shape={h2_cell2.shape}")
h2_cell1.shape=(2142, 128)
h2_cell2.shape=(1886, 128)

The gene hidden-states

The defaults are with reference and query concatenated

[7]:
var_ids1, var_ids2 = dpair.get_vnode_ids(0), dpair.get_vnode_ids(1)

# the embedding layer
embed_gene = hidden_list[0]['gene']

# the first hidden layer
h1_gene = hidden_list[1]['gene']

# the second hidden layer
h2_gene = hidden_list[2]['gene']

# separate reference and query
h2_gene1 = h2_gene[var_ids1]
h2_gene2 = h2_gene[var_ids2]

print(f"h2_gene1.shape={h2_gene1.shape}")
print(f"h2_gene2.shape={h2_gene2.shape}")
h2_gene1.shape=(3421, 128)
h2_gene2.shape=(3135, 128)

Get cell-to-gene attentions

format: n_cells x n_genes CSR sparse matrix

[8]:
import came
attns = came.model.get_attentions(model, feat_dict, g)
[9]:
attns
[9]:
<4028x6556 sparse matrix of type '<class 'numpy.float32'>'
        with 1513823 stored elements in Compressed Sparse Row format>