双聚类
计算机科学
启发式
数据挖掘
集合(抽象数据类型)
启发式
遮罩(插图)
空(SQL)
概率逻辑
节点(物理)
表达式(计算机科学)
算法
数据集
聚类分析
人工智能
工程类
CURE数据聚类算法
艺术
相关聚类
结构工程
视觉艺术
程序设计语言
操作系统
作者
Jiong Yang,Haixun Wang,Wei Wang,Philip S. Yu
标识
DOI:10.1109/bibe.2003.1188969
摘要
Microarrays are one of the latest breakthroughs in experimental molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously and are already producing huge amount of valuable data. The concept of bicluster was introduced by Cheng and Church (2000) to capture the coherence of a subset of genes and a subset of conditions. A set of heuristic algorithms were also designed to either find one bicluster or a set of biclusters, which consist of iterations of masking null values and discovered biclusters, coarse and fine node deletion, node addition, and the inclusion of inverted data. These heuristics inevitably suffer from some serious drawback. The masking of null values and discovered biclusters with random numbers may result in the phenomenon of random interference which in turn impacts the discovery of high quality biclusters. To address this issue and to further accelerate the biclustering process, we generalize the model of bicluster to incorporate null values and propose a probabilistic algorithm (FLOC) that can discover a set of k possibly overlapping biclusters simultaneously. Furthermore, this algorithm can easily be extended to support additional features that suit different requirements at virtually little cost. Experimental study on the yeast gene expression data shows that the FLOC algorithm can offer substantial improvements over the previously proposed algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI