{ "cells": [ { "cell_type": "markdown", "id": "d6495895", "metadata": {}, "source": [ "# Getting started (for datasets with aligned features)\n", "\n", "Import the required packages:" ] }, { "cell_type": "code", "execution_count": 1, "id": "6935113b", "metadata": {}, "outputs": [], "source": [ "import os\n", "import sys\n", "from pathlib import Path\n", "\n", "import numpy as np\n", "import pandas as pd\n", "from matplotlib import pyplot as plt # optional\n", "import seaborn as sns # optional\n", "\n", "import scanpy as sc\n", "from scipy import sparse\n", "\n", "import networkx as nx\n", "import torch" ] }, { "cell_type": "markdown", "id": "68f64687", "metadata": {}, "source": [ "If you get trouble with installing CAME, you can download the source code from GitHub, \n", "and append the path to `sys.path`. For example:\n", "\n", "```python\n", "CAME_ROOT = Path('path/to/CAME')\n", "sys.path.append(str(CAME_ROOT))\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "id": "c20e3fe2", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using backend: pytorch\n" ] } ], "source": [ "import came\n", "from came import pipeline, pp, pl\n", "\n", "ROOT = Path(\".\") # set root" ] }, { "cell_type": "markdown", "id": "7396c5e5", "metadata": {}, "source": [ "## 0 Load datasets\n", "\n", "### 0.1 Load the example datasets" ] }, { "cell_type": "code", "execution_count": 3, "id": "00d1ffd3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dict_keys(['adatas', 'varmap', 'varmap_1v1', 'dataset_names', 'key_class'])\n", "a new directory made:\n", "\t_temp\\('Baron_human', 'Baron_mouse')-(12-16 18.13.20)\\figs\n" ] } ], "source": [ "from came import load_example_data\n", "\n", "example_data_dict = load_example_data()\n", "print(example_data_dict.keys())\n", "\n", "adatas = example_data_dict['adatas']\n", "dsnames = example_data_dict['dataset_names']\n", "\n", "adata_raw1, adata_raw2 = adatas\n", "key_class1 = key_class2 = example_data_dict['key_class']\n", "\n", "df_varmap_1v1 = example_data_dict['varmap_1v1'] # set as None if NOT cross species\n", "\n", "# setting directory for results\n", "time_tag = came.make_nowtime_tag()\n", "resdir = ROOT /'_temp' / f'{dsnames}-{time_tag}'\n", "figdir = resdir / 'figs'\n", "came.check_dirs(figdir) # check and make the directory" ] }, { "cell_type": "markdown", "id": "b134821e", "metadata": {}, "source": [ "### 0.2 Load your own datasets\n", "\n", "To load your own datasets, see the code example below:\n", "\n", "```python\n", "# ========= customized paths ==========\n", "\n", "dsnames = ('Baron_human', 'Baron_mouse') # the dataset names, set by user\n", "dsn1, dsn2 = dsnames\n", "\n", "path_rawdata1 = CAME_ROOT / 'came/sample_data/raw-Baron_human.h5ad'\n", "path_rawdata2 = CAME_ROOT / 'came/sample_data/raw-Baron_mouse.h5ad'\n", "path_varmap_1v1 = CAME_ROOT / f'came/sample_data/gene_matches_1v1_human2mouse.csv'\n", "```\n", "\n", "Load scRNA-seq datasets.\n", "\n", "```python\n", "# ========= load data =========\n", "df_varmap = pd.read_csv(path_varmap)\n", "df_varmap_1v1 = pd.read_csv(path_varmap_1v1) if path_varmap_1v1 else came.pp.take_1v1_matches(df_varmap)\n", "\n", "adata_raw1 = sc.read_h5ad(path_rawdata1)\n", "adata_raw2 = sc.read_h5ad(path_rawdata2)\n", "adatas = [adata_raw1, adata_raw2]\n", "```\n", "\n", "Sepcifiy the column names of the cell-type labels, where `key_class1` is for reference data, and `key_class2` is for query data. If there aren't any cell-type or clustering labels for the query cells, you can set `key_class=None`.\n", "\n", "```python\n", "key_class1 = 'cell_ontology_class' # set by user\n", "key_class2 = 'cell_ontology_class' # set by user\n", "```\n", "\n", "Setting directory for results\n", "\n", "```python\n", "time_tag = came.make_nowtime_tag()\n", "resdir = ROOT /'_temp' / f'{dsnames}-{time_tag}' # set by user\n", "figdir = resdir / 'figs'\n", "came.check_dirs(figdir) # check and make the directory\n", "```" ] }, { "cell_type": "markdown", "id": "14ef61ed", "metadata": {}, "source": [ "Filtering genes (a preprocessing step, optional)\n", "\n", "```python\n", "sc.pp.filter_genes(adata_raw1, min_cells=3)\n", "sc.pp.filter_genes(adata_raw2, min_cells=3)\n", "```" ] }, { "cell_type": "markdown", "id": "6b9aa256", "metadata": {}, "source": [ "### 0.3 Inspect the compositions of different classes" ] }, { "cell_type": "code", "execution_count": 4, "id": "9242f5e1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | cell_ontology_class | \n", "cell_ontology_class | \n", "
---|---|---|
B cell | \n", "NaN | \n", "10.0 | \n", "
Schwann cell | \n", "13.0 | \n", "6.0 | \n", "
T cell | \n", "7.0 | \n", "7.0 | \n", "
endothelial cell | \n", "252.0 | \n", "139.0 | \n", "
leukocyte | \n", "NaN | \n", "8.0 | \n", "
macrophage | \n", "55.0 | \n", "36.0 | \n", "
mast cell | \n", "25.0 | \n", "NaN | \n", "
pancreatic A cell | \n", "2326.0 | \n", "191.0 | \n", "
pancreatic D cell | \n", "601.0 | \n", "218.0 | \n", "
pancreatic PP cell | \n", "255.0 | \n", "41.0 | \n", "
pancreatic acinar cell | \n", "958.0 | \n", "NaN | \n", "
pancreatic ductal cell | \n", "1077.0 | \n", "275.0 | \n", "
pancreatic epsilon cell | \n", "18.0 | \n", "NaN | \n", "
pancreatic stellate cell | \n", "457.0 | \n", "61.0 | \n", "
type B pancreatic cell | \n", "2525.0 | \n", "894.0 | \n", "