2nd scanpy session - Quality control and filtering
Pre-Processing in Nutanix Experiments
Understanding Barcode Ranking
- The process begins with barcode ranking, where molecules assigned to each barcode are counted and ranked based on the number of molecules.
- Barcodes with a high number of assigned molecules indicate actual cells, while those with few assigned molecules often represent empty droplets.
Addressing Empty Droplets
- In Nutanix experiments, many droplets are empty to prevent doublets (two cells in one droplet), necessitating a dilution strategy.
- Even empty droplets may contain some RNA molecules from cell suspensions, contributing to background signals in data analysis.
Background Signal and Its Implications
- The background signal arises when cells break apart during treatment, releasing their contents into the suspension—often referred to as "cell soup."
- This background can lead to misleading gene expression data, particularly evident in studies involving beta cells that produce high levels of insulin mRNA.
Analyzing Ambient RNAs
- To mitigate the impact of ambient RNAs on analysis, researchers may choose to exclude certain genes from their datasets.
Threshold Setting for Data Analysis
- Histograms are used to visualize count depth (total detected molecules), helping establish thresholds for distinguishing between viable cells and debris.
- When uncertain about threshold settings, a lenient approach is recommended; including questionable cells allows for further downstream analysis before exclusion.
Circular Nature of Data Analysis
Iterative Pre-processing Steps
- Data analysis is described as a circular process where pre-processing can consume up to 80% of the total time spent on analysis due to ongoing adjustments and refinements.
Utilizing 2D Scatter Plots
- 2D scatter plots depicting gene counts versus read depth help identify different cell types by revealing patterns such as V-shapes in data distribution.