期刊:Social Science Research Network [Social Science Electronic Publishing] 日期:2022-01-01
标识
DOI:10.2139/ssrn.4308307
摘要
In this paper, we propose a novel dynamic label assignment method, Optimal Action Segment Assignment (OASA) for temporal action detection (TAD). The proposed OASA converts label assignment into an optimal transportation problem by computing the cost matrix between predicted temporal action segments and groundtruths. The unit transportation cost between a predicted temporal segment and a groundtruth pair is defined as the weighted summation of action classification loss and temporal localization loss. In addition, we deploy Adaptive Estimation of Candidate Segment Number (AE-CSN) to adaptively determine the number of positive samples for each groundtruth. After formulation, the label assignment problem is converted to find a global optimal assignment plan by minimizing the cost. Therefore, OASA eliminates the prior parameters that need to be designed manually which exists in fixed label assignment and improves the generalization of the algorithm between different datasets. To evaluate OASA, we also design a simple anchor-free temporal action detector, ActionMixer. It consists of Temporal Mixer and Channel Mixer. Temporal Mixer employs depth-wise convolution layers wit large kernels to capture temporal information, and Channel Mixer mixes and extracts features across channel dimension. Extensive experiments on THUMOS-14, ActivityNet-1.3, and EPIC-Kitchens-100 show that ActionMixer equipped with OASA achieves state-of-the-art performance, surpassing other advanced temporal action detection methods.