2nd scanpy session - Quality control and filtering

2nd scanpy session - Quality control and filtering

Pre-Processing in Nutanix Experiments

Understanding Barcode Ranking

  • The process begins with barcode ranking, where molecules assigned to each barcode are counted and ranked based on the number of molecules.
  • Barcodes with a high number of assigned molecules indicate actual cells, while those with few assigned molecules often represent empty droplets.

Addressing Empty Droplets

  • In Nutanix experiments, many droplets are empty to prevent doublets (two cells in one droplet), necessitating a dilution strategy.
  • Even empty droplets may contain some RNA molecules from cell suspensions, contributing to background signals in data analysis.

Background Signal and Its Implications

  • The background signal arises when cells break apart during treatment, releasing their contents into the suspension—often referred to as "cell soup."
  • This background can lead to misleading gene expression data, particularly evident in studies involving beta cells that produce high levels of insulin mRNA.

Analyzing Ambient RNAs

  • To mitigate the impact of ambient RNAs on analysis, researchers may choose to exclude certain genes from their datasets.

Threshold Setting for Data Analysis

  • Histograms are used to visualize count depth (total detected molecules), helping establish thresholds for distinguishing between viable cells and debris.
  • When uncertain about threshold settings, a lenient approach is recommended; including questionable cells allows for further downstream analysis before exclusion.

Circular Nature of Data Analysis

Iterative Pre-processing Steps

  • Data analysis is described as a circular process where pre-processing can consume up to 80% of the total time spent on analysis due to ongoing adjustments and refinements.

Utilizing 2D Scatter Plots

  • 2D scatter plots depicting gene counts versus read depth help identify different cell types by revealing patterns such as V-shapes in data distribution.
Video description

In the second session of the scanpy tutorial, we introduce quality control of single-cell data and how to determine appropriate filtering thresholds. This is the recording of the scanpy tutorial held at Helmholtz Munich in July 2020.