数据流挖掘
计算机科学
假阳性悖论
数据挖掘
过程(计算)
错误发现率
统计过程控制
假阳性和假阴性
多重比较问题
溪流
控制(管理)
人工智能
统计
数学
计算机网络
生物化学
化学
基因
操作系统
作者
Wendong Li,Dongdong Xiang,Fugee Tsung,Xiaolong Pu
出处
期刊:Technometrics
[Informa]
日期:2019-02-11
卷期号:62 (1): 84-100
被引量:36
标识
DOI:10.1080/00401706.2019.1575284
摘要
Monitoring complex systems involving high-dimensional data streams (HDS) provides quick real-time detection of abnormal changes of system performance, but accurate and efficient diagnosis of the streams responsible has also become increasingly important in many data-rich statistical process control applications. Existing diagnostic procedures, designed for low/moderate dimensional multivariate process, may miss too much important information in the out-of-control streams with a high signal-to-noise ratio (SNR) or waste too many resources finding useless in-control streams with a low SNR. In addition, these procedures do not differentiate between streams according to their severity. In this article, we formulate the diagnosis problem of HDS as a multiple testing problem and provide a computationally fast diagnostic procedure to control the weighted missed discovery rate (wMDR) at some satisfactory level. The proposed procedure overcomes the limitations of conventional diagnostic procedures by controlling the wMDR and minimizing the expected number of false positives as well. We show theoretically that the proposed procedure is asymptotically valid and optimal in a certain sense. Simulation studies and a real data analysis from a semiconductor manufacturing process show that the proposed procedure works very well in practice.
科研通智能强力驱动
Strongly Powered by AbleSci AI