支持向量机
朴素贝叶斯分类器
人工智能
机器学习
计算机科学
多层感知器
分类器(UML)
特征选择
缺少数据
交叉验证
数据挖掘
预测建模
医学诊断
感知器
过程(计算)
人工神经网络
医学
操作系统
病理
作者
Karthick Kanagarathinam,Durairaj Sankaran,R. Manikandan
标识
DOI:10.1016/j.datak.2022.102042
摘要
CVD (cardiovascular disease) is one of the most common causes of death in the world today. CVD prediction allows health professionals to make an informed decision about their patients’ health. Data mining is the process of transforming large amounts of medical data in its raw form into actionable insights that can be used to make intelligent forecasts and decisions. Machine learning (ML) based prediction models provide a better solution to help patients’ health diagnoses in the health care industry. The objective of this research is to create a hybrid dataset to aid in the development of a best CVD risk prediction model. The Hungarian, the Switzerland, the Cleveland, and the Long Beach datasets are the most commonly used datasets in heart disease (HD) prediction. These datasets have a maximum of 303 instances with missing values in their features, and the presence of missing values reduces the accuracy of the prediction model. So, in this article, we created the ”Sathvi” dataset by combining these datasets, and it has 531 instances with 12 attributes with no missing data. The Pearson’s correlation method was used to eliminate redundant features during the feature selection process. The Naive Bayes (NB), XGBoost, k-nearest neighbour (k-NN), multilayer perceptron (MLP), support vector machine (SVM), and CatBoost ML classifiers have been applied for prediction. The CatBoost ML classifier was validated with 10-fold cross validation, and the best accuracy ranged from 88.67% to 98.11%, with a mean of 94.34%.
科研通智能强力驱动
Strongly Powered by AbleSci AI