作者
Xun Wang,Ruibao Song,Junmin Xiao,Tong Li,Xueqi Li
摘要
In the data space, time-series analysis has emerged in many fields, including biology, healthcare, and numerous large-scale scientific facilities like astronomy, climate science, particle physics, and genomics. Clustering is one of the most critical methods in time-series analysis. So far, the state-of-art time series clustering algorithm k-Shape has been widely used not only because of its high accuracy, but also because of its relatively low computation cost. However, due to the high heterogeneity of time series data, it can not be simply regarded as a high-dimensional vector. Two time series often need some alignment method in similarity comparison. The alignment between sequences is often a time-consuming process. For example, when using dynamic time warping as a sequence alignment algorithm and if the length of time series is greater than 1,000, a single iteration in the clustering process may take hundreds to tens of thousands of seconds, while the entire clustering cycle often requires dozens of iterations. In this article, we propose a set of novel parallel strategies suitable for GPU's computation model, called Times-C, which is an abbreviation for Time Series Clustering. We define three stages in the analysis process: aggregation, centroid, and class assignment. Times-C includes efficient parallel algorithms and corresponding implementations for these three stages. Overall, the experimental results show that the Times-C algorithm exhibits a performance improvement of one to two orders of magnitude compared to the multi-core CPU version of k-Shape. Furthermore, compared to the GPU version of the k-Shape algorithm, the Times-C algorithm achieves a maximum acceleration of up to 345 times.