随机森林
计算机科学
人工智能
特征选择
宫颈癌
机器学习
转化(遗传学)
模式识别(心理学)
决策树
树(集合论)
癌症
数学
医学
基因
内科学
数学分析
生物化学
化学
作者
Mamun Ali,Kawsar Ahmed,Francis M. Bui,Iraj Sadegh Amiri,Syed Muhammad Ibrahim,Julian M.W. Quinn,Mohammad Ali Moni
标识
DOI:10.1016/j.compbiomed.2021.104985
摘要
Cervical cancer (CC) is the most common type of cancer in women and remains a significant cause of mortality, particularly in less developed countries, although it can be effectively treated if detected at an early stage. This study aimed to find efficient machine-learning-based classifying models to detect early stage CC using clinical data. We obtained a Kaggle data repository CC dataset which contained four classes of attributes including biopsy, cytology, Hinselmann, and Schiller. This dataset was split into four categories based on these class attributes. Three feature transformation methods, including log, sine function, and Z-score were applied to these datasets. Several supervised machine learning algorithms were assessed for their performance in classification. A Random Tree (RT) algorithm provided the best classification accuracy for the biopsy (98.33%) and cytology (98.65%) data, whereas Random Forest (RF) and Instance-Based K-nearest neighbor (IBk) provided the best performance for Hinselmann (99.16%), and Schiller (98.58%) respectively. Among the feature transformation methods, logarithmic gave the best performance for biopsy datasets whereas sine function was superior for cytology. Both logarithmic and sine functions performed the best for the Hinselmann dataset, while Z-score was best for the Schiller dataset. Various Feature Selection Techniques (FST) methods were applied to the transformed datasets to identify and prioritize important risk factors. The outcomes of this study indicate that appropriate system design and tuning, machine learning methods and classification are able to detect CC accurately and efficiently in its early stages using clinical data.
科研通智能强力驱动
Strongly Powered by AbleSci AI