双聚类
计算机科学
连贯性(哲学赌博策略)
数据挖掘
基因本体论
聚类分析
启发式
本体论
冗余(工程)
相关性
关系(数据库)
Kullback-Leibler散度
人工智能
模式识别(心理学)
情报检索
基因
数学
统计
基因表达
模糊聚类
生物
遗传学
哲学
操作系统
认识论
CURE数据聚类算法
几何学
作者
Victor Alexandre Padilha,André C. P. L. F. de Carvalho
标识
DOI:10.1016/j.asoc.2019.105688
摘要
Biclustering algorithms have become popular tools for gene expression data analysis. They can identify local patterns defined by subsets of genes and subsets of samples, which cannot be detected by traditional clustering algorithms. In spite of being useful, biclustering is an NP-hard problem. Therefore, the majority of biclustering algorithms look for biclusters optimizing a pre-established coherence measure. Many heuristics and validation measures have been proposed for biclustering over the last 20 years. However, there is a lack of an extensive comparison of bicluster coherence measures on practical scenarios. To deal with this lack, this paper experimentally analyzes 17 bicluster coherence measures and external measures calculated from information obtained in the gene ontologies. In this analysis, results were produced by 10 algorithms from the literature in 19 gene expression datasets. According to the experimental results, a few pairs of strongly correlated coherence measures could be identified, which suggests redundancy. Moreover, the pairs of strongly correlated measures might change when dealing with normalized or non-normalized data and biclusters enriched by different ontologies. Finally, there was no clear relation between coherence measures and assessment using information from gene ontology.
科研通智能强力驱动
Strongly Powered by AbleSci AI