特征选择
计算机科学
特征(语言学)
人工智能
模式识别(心理学)
支持向量机
序列(生物学)
特征向量
选择(遗传算法)
数据挖掘
机器学习
算法
语言学
遗传学
生物
哲学
作者
Rainer Pudimat,Rolf Backofen,Ernst Günter Schukat-Talamazzini
标识
DOI:10.1142/s0218001409007107
摘要
Biological research produces a wealth of measured data. Neither it is easy for biologists to postulate hypotheses about the behavior or structure of the observed entity because the relevant properties measured are not seen in the ocean of measurements. Nor is it easy to design machine learning algorithms to classify or cluster the data items for the same reason. Algorithms for automatically selecting a highly predictive subset of the measured features can help to overcome these difficulties. We present an efficient feature selection strategy which can be applied to arbitrary feature selection problems. The core technique is a new method for estimating the quality of subsets from previously calculated qualities for smaller subsets by minimizing the mean standard error of estimated values with an approach common to support vector machines. This method can be integrated in many feature subset search algorithms. We have applied it with sequential search algorithms and have been able to reduce the number of quality calculations for finding accurate feature subsets by about 70%. We show these improvements by applying our approach to the problem of finding highly predictive feature subsets for transcription factor binding sites.
科研通智能强力驱动
Strongly Powered by AbleSci AI