心跳
计算机科学
特征选择
人工智能
随机森林
试验装置
模式识别(心理学)
机器学习
特征(语言学)
试验数据
数据集
水准点(测量)
分类器(UML)
数据挖掘
人工神经网络
二元分类
训练集
支持向量机
程序设计语言
地理
哲学
语言学
计算机安全
大地测量学
作者
Stojancho Tudjarski,Aleksandar Stankovski,Marjan Gušev
标识
DOI:10.23919/mipro55190.2022.9803758
摘要
This paper aims at modeling a classifier of Ventricular heartbeats by experimenting with the most advanced classic binary classifiers in different scenarios for feature engineering. Methodology: The results were acquired based on experimenting with XGBoost and Random Forest algorithms, as two of the most advanced classifiers not based on neural networks. Although the annotated ECG data sets contain records with several heartbeat classes, we focus on a model that would distinguish V from others (Non-V heartbeats). Considering that we are dealing with a highly imbalanced data set, we applied the SMOTE algorithm for data enrichment to provide a better-balanced data set for training the model. To acquire better results, we added new calculated features, with and without feature selection. For feature selection, we used the Fisher Selector algorithm. Data: We used MIT-BIH Arrhythmia benchmark database, with train/test split according to the patient-oriented splitting approach that separates the original dataset into two subsets with approximately equal sizes and distribution of heartbeat types. Conclusion: The best results are achieved with XGBoost algorithm with original feature set. We achieved precision of 91.36%, recall of 88.31% and F1 score of 89.81%. Results showed that oversampling does not provide significantly better overall model performance. Still, we would recommend this approach since in practice, when dealing with imbalanced data sets, this leads to more robust models that perform better with data outside the training and test sets, such as when the model is used in production.
科研通智能强力驱动
Strongly Powered by AbleSci AI