「Accurate integration of single cell transcriptome replicates」の論文概要。




Accurate integration of single cell transcriptome replicates

Loza Lopez, Martin De Jesus 大阪大学 DOI:10.18910/88181



Single-cell RNA-sequencing (scRNA-seq) technologies have revolutionized the study of biological systems by capturing gene expression profiles from thousands of cells in the same experiment. One important application of scRNA-seq data is the comparison of two or more samples to describe genetic changes between conditions, e.g. disease or stimulation conditions. In this kind of analysis, replicated samples allow investigating subtle changes in cells composition, improving the understanding and treatment of conditions. However, these analyses are hindered by technical differences of samples known as batch effects. Batch effects must be addressed in every joint analysis due to their possible correlation with the main biological components. But this task is not trivial, as non-linearities in technical differences of samples might appear in distinct ways on every experiment. In the last three years, different methods have been developed for the integration of scRNA-seq data, allowing to create cell atlases and to perform joint analyses of datasets. However, these methods could over-correct, merging cells from different types. This issue is particularly problematic in the analysis of replicated experiments with small batch effects, where cells with subtle changes in gene expression could be masked, affecting the conclusion of the experiment. To address this problem, I designed Canek, a bioinformatics tool to integrate scRNA-seq data replicates sequenced with the same technology. Assuming a linear batch effect within a group of similar cells, Canek uses linear estimation and fuzzy logic to obtain cell-specific correction vectors to integrate datasets. Using tests specifically designed to assess over-correction, I show that Canek integrates datasets with the smallest amount of over-correction as compared with state-of-the-art methods. To show how to implement Canek within a workflow, I performed a complete analysis using Canek for the characterization of mouse cells from the spleen. In the same analysis, I show the parameter dependence for commonly used tools in the study of scRNA-seq data, which will serve as a comprehensive guide for other researchers. Canek is computationally efficient and can integrate thousands of cells without over-correction, a special characteristic that could lead future experiments to the improvement and/or the design of gene-specific treatments.



