计算机科学
动态时间归整
数据挖掘
时间序列
系列(地层学)
聚类分析
最近邻搜索
欧几里德距离
瓶颈
子程序
数据流挖掘
机器学习
人工智能
古生物学
生物
嵌入式系统
操作系统
作者
Thanawin Rakthanmanon,Bilson Campana,Abdullah Mueen,Gustavo E. A. P. A. Batista,Brandon Westover,Qiaoming Zhu,Jesin Zakaria,Eamonn Keogh
标识
DOI:10.1145/2339530.2339576
摘要
Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.
科研通智能强力驱动
Strongly Powered by AbleSci AI