MIDAS Seminar

statistical Methods for Flexible Differential Analysis of Cross-Sample Single-Cell RNA-Seq Datasets

Mark Robinson, PhDAssociate Professor of Statistical Genomics Institute of Molecular SciencesUniversity ofn Zurich
SHARE:

Abstract: Single-cell RNA sequencing (scRNA-seq) has quickly become an empowering technology to characterize the transcriptomes of individual cells. A primary task in the analysis of scRNA-seq data is differential expression analysis (DE). Most early analyses of DE in scRNA-seq data have aimed at identifying differences between cell types, and thus are focused on finding markers for cell sub-populations (experimental units are cells).

There is now an emergence of multi-sample multi-condition scRNA-seq datasets where the goal is to make sample-level inferences (experimental units are samples), with 100s to 1000s of cells measured per replicate. To tackle such complex experimental designs, so-called differential state (DS) analysis follows cell types across a set of samples (e.g., individuals) and experimental conditions (e.g., treatments), in order to identify cell-type specific responses, i.e., changes in cell state. DS analysis: i) should be able to detect expressed changes that affect only a single cell type, a subset of cell types, or even a subset of cells within a cell type; and, ii) is orthogonal to clustering or cell type assignment (i.e., genes typically associated with cell types are not of direct interest for DS). Furthermore, cell-type level DE analysis is arguably more interpretable and biologically meaningful.

We compared three conceptually different approaches that act on the cell-, sample-, and group-level, including: i) mixed-models to cell-level measurements (replicates are cells); ii) aggregating single cells into "pseudo-bulk" data at the sub-population level and leveraging existing robust bulk RNA-seq frameworks (replicates are samples); and, iii) as a reference, existing scRNA-seq DE methods disregarding sample labels, treating each group as a different cell-type (no replicates).

To compare method performances, we implemented a flexible simulation framework that accommodates multiple clusters and samples across experimental conditions, varying sample, cluster, and group sizes, and is able to introduce a broad range of differential expression patterns. Notably, our framework reproduces the many structures characteristic to scRNA-seq data (e.g., dispersion-mean, dropout percentages) as well as the cell-, sample-, and pseudobulk-level variability.

We have implemented this framework along with various DS analysis methods in muscat (https://github.com/HelenaLC/muscat), an R package that provides differential testing and visualization tools for multi-sample multi-group scRNA-seq data.
Bio: Dr. Robinson's research interests are diverse, but more-or-less encompass the general application of statistical methods and data science to experimental data with biological applications. Often, this is within the context of genomics data types, but he is interested in methodological challenges and robust solutions in data, generally. His group also tries to be modern scientists, with a focus on reproducibility (repos for code) and open science (preprints).

1994-1999: B.Sc. (Applied Math and Stats CO-OP), Uni Guelph

1999-2001: M.Sc. (Statistics), Uni British Columbia

2001-2003: Research Assistant (Uni Toronto, with T. Hughes)

2003-2004: Research Scientist (MDS Proteomics, Toronto)

2004-2005: Research Assistant (Uni Toronto, with B. Frey)

2005-2008: Ph.D. Bioinformatics (Uni Melbourne, with T. Speed)

2008-2011: Postdoc (Cancer Epigenetics, Garvan Institute w/ S. Clark)

2011-2017 Assistant Professor (IMLS, Uni. Zurich)

2017-present Associate Professor (IMLS, Uni. Zurich)

Sponsored by

EECS