清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

Machine Learning (ML)-Enabled Automation for High-Throughput Data Processing in Flow Cytometry

计算机科学 人口 流式细胞术 细胞仪 公制(单位) 聚类分析 人工智能 医学 免疫学 工程类 运营管理 环境卫生
作者
Anna Kamysheva,Dmitrii V. Fastovets,Roman N. Kruglikov,Arseniy A. Sokolov,Anastasiya S. Fefler,Anastasiia A. Bolshakova,Anastasia Radko,Ilya Krauz,Sheila T. Yong,M. Goldberg,Ravshan Ataullakhanov,Aleksandr Zaitsev
出处
期刊:Blood [American Society of Hematology]
卷期号:142 (Supplement 1): 905-905 被引量:1
标识
DOI:10.1182/blood-2023-180146
摘要

Introduction Flow cytometry is widely used in clinical and research laboratories for diagnostics, biomarker discovery, and immune system monitoring. Flow cytometry data processing still uses gating- and clustering-based approaches that are highly time-consuming and subjective. Data processing time increases with panel size and number of detected populations, posing challenges to the search for new biomarkers. Low reproducibility and method limitations have thus far hindered efforts to automate and standardize flow cytometry data processing; hence, these efforts have not yielded any significant advancements in data processing methods. Here we present a new ML-based algorithm for automated cell-type labeling. Our supervised ML approach allows us to classify every event in a flow cytometry data file solely based on the presence and absence of markers, without the need for prior knowledge or assumption about cell population content in the sample. This approach enables the detection of rare and/or new cell populations with a high average quality metric (f1-score). The rapid and high-quality analysis our algorithm can perform renders it applicable in clinical settings, particularly for detecting hematological abnormalities and cancers. Methods We processed 500 blood samples from a cohort of healthy donors and patients with various cancer diagnoses using 10 different 18-channel multicolor flow cytometry panels. We then used data from either the entire or a portion of these 500 samples in a 3:1 split for training:test datasets to train and test our algorithm on each cytometry panel. To do this, we manually matched cells with certain cellular phenotypes to create 10 high-quality training sets for supervised learning and 10 test datasets, one pair for each of the 10 panels. To train the cell type classifier, we set up a two-level boosting-based model. The first-level model filters out outliers, including dead cells, cellular debris, beads, and other undefined particles, in order to hone in on the target population. The second-level model for predicting cell types within a target population is defined by two approaches. The population-based approach detects major subpopulation types in a target population and predicts the precise population labels. This approach is useful for labeling a small number of previously known or predicted subpopulations. The marker-based approach is useful for target populations with large numbers of subpopulations, such as T cells harboring different combinations of cell-surface receptors. It predicts the presence or absence of specific markers on each cell to assign its phenotype. It also allows us to construct complex hierarchies in order to detect new populations that are challenging to identify manually. Figure 1 outlines our workflow. Results We validated our final set of 10 trained models on our test dataset. The summarized number of detected cell populations in the test dataset was 221, which corresponds to the number of unique cell types predicted by our models. Table 1 shows the evaluation metrics for our algorithm for populations with > 0.1% whole blood cells (WBCs).The average quality metric (f1-score) for all antibody panels used is 0.86. This value is the mean of all f1-scores calculated for all cell populations identified by our algorithm. Mean f1-score is the highest (0.96) for large populations, lower (0.87) for mid-sized populations, and lowest but acceptable (0.77) for small populations. Mean quality score for the marker-based models is also high (0.96). Compared to manual evaluation that took approximately 1 hour to analyze one data file, the algorithm completed analysis within 10 seconds. Conclusion Our new algorithm automates cell labeling and produces high-quality outputs that are comparable to manual processing, but with a much shorter turnaround time (TAT) and without the need for prior knowledge or expert competence from the user. Importantly, it allows us to effectively and accurately filter out outliers, identify the target population, and divide this target population into multiple cell subtypes including new and rare cell subpopulations, all without a priori assumptions about cell population content in the sample. Given its ability to perform high-quality cell population analysis and its short TAT, our algorithm provides rapid, unbiased, and precise cell typing that will have utility for the diagnosis of heme malignancies and immunoprofiling.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
J_Xu完成签到 ,获得积分10
刚刚
优雅的平安完成签到 ,获得积分0
1秒前
王波完成签到 ,获得积分10
9秒前
vantie完成签到 ,获得积分10
17秒前
专注的觅云完成签到 ,获得积分10
19秒前
bajiu完成签到 ,获得积分10
23秒前
35秒前
Gary完成签到 ,获得积分10
35秒前
一剑温柔完成签到 ,获得积分10
38秒前
刻苦的新烟完成签到 ,获得积分0
40秒前
白华苍松发布了新的文献求助10
41秒前
chen完成签到 ,获得积分10
56秒前
量子星尘发布了新的文献求助10
57秒前
Josselin完成签到,获得积分10
58秒前
林好人完成签到 ,获得积分10
1分钟前
orixero应助布洛芬缓释胶囊采纳,获得10
1分钟前
chunlily完成签到,获得积分10
1分钟前
xrose完成签到 ,获得积分10
1分钟前
1分钟前
务实莫言完成签到 ,获得积分10
1分钟前
earthclean完成签到,获得积分10
1分钟前
AAA卫生院食堂后厨杨姐完成签到 ,获得积分10
1分钟前
沙海沉戈完成签到,获得积分0
1分钟前
白华苍松发布了新的文献求助10
1分钟前
drughunter009完成签到 ,获得积分10
1分钟前
蛋卷完成签到 ,获得积分10
1分钟前
漂风完成签到 ,获得积分10
1分钟前
Ryan完成签到,获得积分10
1分钟前
冰河完成签到 ,获得积分10
1分钟前
1分钟前
helen李完成签到 ,获得积分10
1分钟前
神勇的天问完成签到 ,获得积分10
1分钟前
w学术完成签到 ,获得积分10
1分钟前
Jasper应助科研通管家采纳,获得10
2分钟前
Lucas应助科研通管家采纳,获得10
2分钟前
赘婿应助科研通管家采纳,获得10
2分钟前
Market123580完成签到 ,获得积分10
2分钟前
香蕉涫完成签到 ,获得积分10
2分钟前
yinyin完成签到 ,获得积分10
2分钟前
2分钟前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Aerospace Standards Index - 2026 ASIN2026 3000
Polymorphism and polytypism in crystals 1000
Signals, Systems, and Signal Processing 610
Discrete-Time Signals and Systems 610
Research Methods for Business: A Skill Building Approach, 9th Edition 500
Social Work and Social Welfare: An Invitation(7th Edition) 410
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 纳米技术 有机化学 物理 生物化学 化学工程 计算机科学 复合材料 内科学 催化作用 光电子学 物理化学 电极 冶金 遗传学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 6051249
求助须知:如何正确求助?哪些是违规求助? 7857596
关于积分的说明 16267462
捐赠科研通 5196302
什么是DOI,文献DOI怎么找? 2780574
邀请新用户注册赠送积分活动 1763503
关于科研通互助平台的介绍 1645516