The latest advancement in human activity recognition (HAR) involves the use of deep neural networks to achieve greater accuracy in the classification of various activities. A popular approach in the field is to encode time series data from inertial sensors into images and then apply techniques from computer vision to analyze the data. However, encoding into images often leads to a significant surge in the amount of data and a subsequent rise in computational cost, making this method less efficient for real-world applications. In this paper, we propose a novel image-coding approach, alternating sampling amplitude-phase field (AS-APF), and a multi-sensor fusion framework based on selective kernel (SK). AS-APF can reduce the amount of image data while ensuring the integrity and representativeness of the data. Because it splits the time series and preserves the main feature information. We introduce SK to learn multi-scale features in HAR instead of a fixed receptive fields (RFs) size. Our experimental results demonstrate that our approach outperforms previous encoding methods in both accuracy and time efficiency.