特征选择                        
                
                                
                        
                            熵(时间箭头)                        
                
                                
                        
                            相互信息                        
                
                                
                        
                            计算机科学                        
                
                                
                        
                            人工智能                        
                
                                
                        
                            数据挖掘                        
                
                                
                        
                            粗集                        
                
                                
                        
                            特征(语言学)                        
                
                                
                        
                            计算智能                        
                
                                
                        
                            粒度计算                        
                
                                
                        
                            模式识别(心理学)                        
                
                                
                        
                            机器学习                        
                
                                
                        
                            数学                        
                
                                
                        
                            语言学                        
                
                                
                        
                            哲学                        
                
                                
                        
                            物理                        
                
                                
                        
                            量子力学                        
                
                        
                    
            作者
            
                Yuan Meng,Jiucheng Xu,Tao Li,Yuanhao Sun            
         
                    
        
    
            
            标识
            
                                    DOI:10.1007/s40747-022-00882-8
                                    
                                
                                 
         
        
                
            摘要
            
            Abstract For incomplete datasets with mixed numerical and symbolic features, feature selection based on neighborhood multi-granulation rough sets (NMRS) is developing rapidly. However, its evaluation function only considers the information contained in the lower approximation of the neighborhood decision, which easily leads to the loss of some information. To solve this problem, we construct a novel NMRS-based uncertain measure for feature selection, named neighborhood multi-granulation self-information-based pessimistic neighborhood multi-granulation tolerance joint entropy (PTSIJE), which can be used to incomplete neighborhood decision systems. First, from the algebra view, four kinds of neighborhood multi-granulation self-information measures of decision variables are proposed by using the upper and lower approximations of NMRS. We discuss the related properties, and find the fourth measure-lenient neighborhood multi-granulation self-information measure (NMSI) has better classification performance. Then, inspired by the algebra and information views simultaneously, a feature selection method based on PTSIJE is proposed. Finally, the Fisher score method is used to delete uncorrelated features to reduce the computational complexity for high-dimensional gene datasets, and a heuristic feature selection algorithm is raised to improve classification performance for mixed and incomplete datasets. Experimental results on 11 datasets show that our method selects fewer features and has higher classification accuracy than related methods.
         
            
 
                 
                
                    
                    科研通智能强力驱动
Strongly Powered by AbleSci AI