1 Installation

Install the package using Bioconductor. Start R and enter:

# if(!requireNamespace("BiocManager", quietly = TRUE))
    # install.packages("BiocManager")
# BiocManager::install("MouseAgingData")

2 Setup

Now, load the package and dependencies used in the vignette.

library(scran)
library(scater)
library(ggplot2)
library(bluster)
library(SingleCellExperiment)
library(ExperimentHub)
library(MouseAgingData)

3 Introduction

Single-cell sequencing technology can reveal intricate details about individual cells, allowing researchers to interrogate the genetic make up of cells within a heterogeneous sample. Single-cell sequencing can provide insights into various aspects of cellular biology, such as characterization of cell populations, identification of rare cell types, and quantification of expression levels in cell types across experimental treatments. Given the wide utility, single-cell sequencing has expanded scientific knowledge in various fields, including cancer research, immunology, developmental biology, neurobiology, and microbiology.

There are several methods for generating single-cell sequencing data which can extract information (DNA or RNA) from a cell. These include, but are not limited to:

  1. Droplet-based platforms: such as 10x Genomics Chromium system, inDrop, Drop-seq, and Seq-Well, which use microfluidic devices to isolate individual cells into tiny droplets along with unique barcoded beads.

  2. Plate or microwell-based methods: such as the Smart-seq2 protocol or the C1 system by Fluidigm, respectively. These platforms employ microfluidic chips or multi-well arrays to capture and process individual cells. Unlike droplet-based platforms, these cells are manually or automatically sorted into individual wells of the plate.

The MouseAgingData package provides analysis-ready data from an aging mouse brain parabiosis single cell study by Ximerakis & Holton et al., (2023). The contents of the package can be accessed by querying ExperimentHub with the package name.

4 Data

Ximerakis & Holton et al. investigated how heterochronic parabiosis (joining of the circulatory systems) affects the mouse brain in terms of aging and rejuvenation. They identified gene signatures attributed to aging in specific cell-types. They focus especially on brain endothelial cells, which showed dynamic transcriptional changes that affect vascular structure and function.

The parabiosis single cell RNA-seq (Ximerakis, Holton et al Nature Aging 2023) includes 105,329 cells, 31 cell types across 8 OX, 8 YX, 7 YY, 9 YO, 7 OO, 11 OY animals, and 20905 features.

This vignette performs a simple analysis of the parabiosis 10X Genomics single-cell data set, following the Quick Start Workflow of Single-Cell Analysis in the OSCA Bioconductor Book.

Briefly, it walks through the process of quality control, normalization, various forms of dimensionality reduction, clustering, detection of marker genes, and annotation of cell types. PCA, UMAP, and tSNE coordinates used in the study were provided by the authors for visualization.

5 Load the data set from ExperimentHub

sce <- parabiosis10X()
#> see ?MouseAgingData and browseVignettes('MouseAgingData') for documentation
#> loading from cache
# View the data
sce
#> class: SingleCellExperiment 
#> dim: 20905 105329 
#> metadata(1): cell_colors
#> assays(1): counts
#> rownames(20905): Xkr4 Gm37381 ... DHRSX CAAA01147332.1
#> rowData names(2): geneID HVG
#> colnames: NULL
#> colData names(10): barcode nCount_RNA ... cell_type subpopulation
#> reducedDimNames(3): PCA UMAP TSNE
#> mainExpName: NULL
#> altExpNames(0):

Do some checking to make sure the data loaded correctly and is what we expected.

# Sample metadata
head(colData(sce)) 
#> DataFrame with 6 rows and 10 columns
#>            barcode nCount_RNA nFeature_RNA   animal    batch animal_type
#>        <character>  <numeric>    <integer> <factor> <factor>    <factor>
#> 1 AAACCTGGTCAGTGGA    2100.06          815     OO1L   Batch1          OO
#> 2 AAACCTGGTGTCAATC    4356.88         3120     OO1L   Batch1          OO
#> 3 AAACCTGTCAAACCAC    2679.97         1208     OO1L   Batch1          OO
#> 4 AAACCTGTCGTTACAG    3647.74         2137     OO1L   Batch1          OO
#> 5 AAACGGGCACGAGAGT    1904.85          703     OO1L   Batch1          OO
#> 6 AAAGATGAGCGTAGTG    3732.96         2247     OO1L   Batch1          OO
#>   percent_mito percent_ribo cell_type subpopulation
#>      <numeric>    <numeric>  <factor>      <factor>
#> 1     1.253203      5.81833     OPC         qOPC   
#> 2     0.510883      3.48925     NendC       NendC_3
#> 3     0.789625      3.67955     OPC         qOPC   
#> 4     0.607773      3.99532     GABA        GABA_3 
#> 5     1.746996      8.52778     EC          EC_1   
#> 6     0.652196      3.85105     GABA        GABA_13
# Includes cell colors from the original paper
metadata(sce)
#> $cell_colors
#>               OPC               OLG               OEG               NSC 
#>      "olivedrab4"      "olivedrab3"      "olivedrab1"      "royalblue4" 
#>               ARP               ASC               EPC            HypEPC 
#>      "steelblue4"      "steelblue1" "lightgoldenrod4" "lightgoldenrod3" 
#>               TNC               CPC               NRP              ImmN 
#> "lightgoldenrod2"            "gold"     "darkmagenta"         "purple3" 
#>              GABA              DOPA              GLUT              CHOL 
#>   "mediumorchid3"      "violetred3"   "palevioletred"          "violet" 
#>             NendC                EC                PC              VSMC 
#>       "lightpink"         "sienna4"         "sienna3"         "sienna1" 
#>             Hb_VC              VLMC               ABC                MG 
#>            "peru"      "peachpuff4"      "peachpuff3"            "red4" 
#>               MAC               MNC                DC              NEUT 
#>            "red3"         "tomato3"            "red1"         "tomato1" 
#>            T_cell                NK            B_cell 
#>         "salmon3"      "indianred2"           "coral"

6 Quality control

In this step, we can explore and visualize mitochondrial content and read count. However, the authors have already removed low-quality cells and animals so we will skip this section in this vignette. For more details on their workflow, one can refer to the original article Ximerakis & Holton et al. (2023). The OSCA Bioconductor book also provides several examples of quality control steps as well.

7 Normalization

Normalize the expression counts. For the purposes of demonstration, we’ll subset this SingleCellExperiment object down to the first 1000 cells.

sce_subset <- sce[, 1:1000]
set.seed(101000110)
clusters <- quickCluster(sce_subset)
sce_subset <- computeSumFactors(sce_subset, clusters=clusters)
sce_subset <- logNormCounts(sce_subset)

logcounts(sce_subset)[1:10, 1:10]
#> 10 x 10 sparse Matrix of class "dgCMatrix"
#>                                                                  
#> Xkr4    .        .         . . . .         . .        .         .
#> Gm37381 .        .         . . . .         . .        .         .
#> Rp1     .        .         . . . .         . .        .         .
#> Sox17   .        .         . . . .         . .        1.5643042 .
#> Mrpl15  .        0.5233882 . . . .         . .        0.5746703 .
#> Lypla1  .        .         . . . .         . .        0.5746703 .
#> Gm37988 .        .         . . . .         . .        .         .
#> Tcea1   1.837044 .         . . . .         . 1.337954 .         .
#> Rgs20   .        .         . . . 0.7554552 . .        .         .
#> Gm16041 .        .         . . . .         . .        .         .

8 Feature selection

At this point in a typical workflow, we could select an appropriate set of highly variable genes (HVGs), say the top 10% of genes with the highest variability in expression. Below is an example of how to do this with our subsetted SingleCellExperiment example.

dec <- modelGeneVar(sce_subset)
hvg <- getTopHVGs(dec, prop=0.1)


However, a logical index showing the 2000 HVG included in the original study conducted by the authors can also be accessed in the original SingleCellExperiment object in the rowData() slot.

head(rowData(sce))
#> DataFrame with 6 rows and 2 columns
#>              geneID      HVG
#>         <character> <factor>
#> Xkr4           Xkr4    FALSE
#> Gm37381     Gm37381    FALSE
#> Rp1             Rp1    FALSE
#> Sox17         Sox17    FALSE
#> Mrpl15       Mrpl15    FALSE
#> Lypla1       Lypla1    FALSE

9 PCA

Below is a method for running a Principal Components Analysis using our previously defined HVGs. Since this step can take a significant amount of time to compute, we will again just apply it to our subset of 1000 cells as demonstration.

# Since we already have PCA coords from our authors, we will add these computed 
# PCA coords under a different name "osca_PCA"

set.seed(1234)
sce_subset <- runPCA(sce_subset, ncomponents=25, subset_row=hvg, 
                     name = "osca_PCA")


# Show the names of the elements in the ReducedDims slot
reducedDims(sce_subset)
#> List of length 4
#> names(4): PCA UMAP TSNE osca_PCA


As mentioned, the authors have also provided us with the first 50 PCs used in their study within the full SingleCellExperiment object. Let’s take a look.

reducedDim(sce_subset, "PCA")[1:5, 1:5]
#>             PC1        PC2         PC3         PC4        PC5
#> [1,]  3.7896377 -2.0354706  -1.3315832   0.3672115  1.0848995
#> [2,]  1.5902497 -5.5788552 -10.3563694   3.4867199 -0.3234046
#> [3,]  3.8162534 -3.3919824  -1.8970444   0.7165197  1.2934682
#> [4,]  1.7208498 -4.2866718  -9.3561901   2.3150841 -0.4629331
#> [5,] -0.7078829 -0.6319163   0.4068132 -18.8230666  2.9753249

10 Clustering

At this point, we could take the PCs that were previously computed and do some clustering of cells based on expression profiles. More details are provided in the OSCA book here. Let’s do some clustering with our subsetted object as an example.

colLabels(sce_subset) <- clusterCells(sce_subset, use.dimred='osca_PCA',
                               BLUSPARAM=NNGraphParam(cluster.fun="louvain"))

11 Visualization

For this dataset, the authors have already provided us with their exact UMAP and tSNE coordinates, as well as their color scheme representing the cell types from their paper. This can be accessed in the metadata slot of the SingleCellExperiment object with the metadata() function. To consistently recreate their figures, let’s plot using their provided coordinates.


# Generate color map matching cell type to colors in publication
cell.color <- metadata(sce)$cell_color

gg <- plotUMAP(sce, color_by = "cell_type", text_by = "cell_type") 
gg + theme(legend.title=element_blank()) + 
    scale_color_manual(values=c(cell.color))
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.

This plot is a recreation of Fig. 2C from Ximerakis & Holton et al. 2023.


We can also plot a tSNE with their provided coordinates.

gg <- plotTSNE(sce, color_by = "cell_type", text_by = "cell_type") 
gg + theme(legend.title=element_blank()) + 
    scale_color_manual(values=c(cell.color))
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.


If you would like to create your own UMAP and tSNE plots, please refer back to the OSCA Bioconductor book for more details.

12 Reference

Ximerakis & Holton et al. (2023) Heterochronic parabiosis reprograms the mouse brain transcriptome by shifting aging signatures in multiple cell types. 3, 327–345. DOI:https://doi.org/10.1038/s43587-023-00373-6.

13 Session Info

sessionInfo()
#> R Under development (unstable) (2024-01-16 r85808)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] MouseAgingData_0.99.2       ExperimentHub_2.11.1       
#>  [3] AnnotationHub_3.11.1        BiocFileCache_2.11.1       
#>  [5] dbplyr_2.4.0                bluster_1.13.0             
#>  [7] scater_1.31.2               ggplot2_3.4.4              
#>  [9] scran_1.31.2                scuttle_1.13.0             
#> [11] SingleCellExperiment_1.25.0 SummarizedExperiment_1.33.3
#> [13] Biobase_2.63.0              GenomicRanges_1.55.2       
#> [15] GenomeInfoDb_1.39.6         IRanges_2.37.1             
#> [17] S4Vectors_0.41.3            BiocGenerics_0.49.1        
#> [19] MatrixGenerics_1.15.0       matrixStats_1.2.0          
#> [21] BiocStyle_2.31.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] DBI_1.2.1                 bitops_1.0-7             
#>  [3] gridExtra_2.3             rlang_1.1.3              
#>  [5] magrittr_2.0.3            compiler_4.4.0           
#>  [7] RSQLite_2.3.5             DelayedMatrixStats_1.25.1
#>  [9] png_0.1-8                 vctrs_0.6.5              
#> [11] pkgconfig_2.0.3           crayon_1.5.2             
#> [13] fastmap_1.1.1             XVector_0.43.1           
#> [15] labeling_0.4.3            utf8_1.2.4               
#> [17] rmarkdown_2.25            ggbeeswarm_0.7.2         
#> [19] purrr_1.0.2               bit_4.0.5                
#> [21] xfun_0.42                 zlibbioc_1.49.0          
#> [23] cachem_1.0.8              beachmat_2.19.1          
#> [25] jsonlite_1.8.8            blob_1.2.4               
#> [27] highr_0.10                DelayedArray_0.29.1      
#> [29] BiocParallel_1.37.0       irlba_2.3.5.1            
#> [31] parallel_4.4.0            cluster_2.1.6            
#> [33] R6_2.5.1                  bslib_0.6.1              
#> [35] limma_3.59.2              jquerylib_0.1.4          
#> [37] Rcpp_1.0.12               bookdown_0.37            
#> [39] knitr_1.45                Matrix_1.6-5             
#> [41] igraph_2.0.1.1            tidyselect_1.2.0         
#> [43] abind_1.4-5               yaml_2.3.8               
#> [45] viridis_0.6.5             codetools_0.2-19         
#> [47] curl_5.2.0                lattice_0.22-5           
#> [49] tibble_3.2.1              KEGGREST_1.43.0          
#> [51] withr_3.0.0               evaluate_0.23            
#> [53] Biostrings_2.71.2         filelock_1.0.3           
#> [55] pillar_1.9.0              BiocManager_1.30.22      
#> [57] generics_0.1.3            RCurl_1.98-1.14          
#> [59] BiocVersion_3.19.1        sparseMatrixStats_1.15.0 
#> [61] munsell_0.5.0             scales_1.3.0             
#> [63] glue_1.7.0                metapod_1.11.1           
#> [65] tools_4.4.0               BiocNeighbors_1.21.2     
#> [67] ScaledMatrix_1.11.0       locfit_1.5-9.8           
#> [69] cowplot_1.1.3             grid_4.4.0               
#> [71] AnnotationDbi_1.65.2      edgeR_4.1.16             
#> [73] colorspace_2.1-0          GenomeInfoDbData_1.2.11  
#> [75] beeswarm_0.4.0            BiocSingular_1.19.0      
#> [77] vipor_0.4.7               cli_3.6.2                
#> [79] rsvd_1.0.5                rappdirs_0.3.3           
#> [81] fansi_1.0.6               S4Arrays_1.3.3           
#> [83] viridisLite_0.4.2         dplyr_1.1.4              
#> [85] gtable_0.3.4              sass_0.4.8               
#> [87] digest_0.6.34             SparseArray_1.3.4        
#> [89] ggrepel_0.9.5             dqrng_0.3.2              
#> [91] farver_2.1.1              memoise_2.0.1            
#> [93] htmltools_0.5.7           lifecycle_1.0.4          
#> [95] httr_1.4.7                mime_0.12                
#> [97] statmod_1.5.0             bit64_4.0.5