Name: 2nd scanpy session - Quality control and filtering
Uploaded: 2022-01-25T17:27:24.000Z
Duration: 14 min 14 s
Description: In the second session of the scanpy tutorial, we introduce quality control of single-cell data and how to determine appropriate filtering thresholds. This is the recording of the scanpy tutorial held at Helmholtz Munich in July 2020.

2nd scanpy session - Quality control and filtering

Pre-Processing in Nutanix Experiments

Understanding Barcode Ranking

The process begins with barcode ranking, where molecules assigned to each barcode are counted and ranked based on the number of molecules.

Barcodes with a high number of assigned molecules indicate actual cells, while those with few assigned molecules often represent empty droplets.

Addressing Empty Droplets

In Nutanix experiments, many droplets are empty to prevent doublets (two cells in one droplet), necessitating a dilution strategy.

Even empty droplets may contain some RNA molecules from cell suspensions, contributing to background signals in data analysis.

Background Signal and Its Implications

The background signal arises when cells break apart during treatment, releasing their contents into the suspension—often referred to as "cell soup."

This background can lead to misleading gene expression data, particularly evident in studies involving beta cells that produce high levels of insulin mRNA.

Analyzing Ambient RNAs

To mitigate the impact of ambient RNAs on analysis, researchers may choose to exclude certain genes from their datasets.

Threshold Setting for Data Analysis

Histograms are used to visualize count depth (total detected molecules), helping establish thresholds for distinguishing between viable cells and debris.

When uncertain about threshold settings, a lenient approach is recommended; including questionable cells allows for further downstream analysis before exclusion.

Circular Nature of Data Analysis

Iterative Pre-processing Steps

Data analysis is described as a circular process where pre-processing can consume up to 80% of the total time spent on analysis due to ongoing adjustments and refinements.

Utilizing 2D Scatter Plots

2D scatter plots depicting gene counts versus read depth help identify different cell types by revealing patterns such as V-shapes in data distribution.