Road damage detection is significant for road maintenance. Traditional manual visual inspection methods consume lots of time and labor. Developments in the field of computer vision create opportunities for automated and efficient image-based road damage detection. Through deep convolution neural networks, road damage localization and classification can be achieved simultaneously. This paper proposes an ensemble model with test time augmentation based on the You Only Look Once (YOLOv5) network and attention modules. The approach utilizes a state-of-the-art object detector known as YOLOv5. To focus more on the road in images, five improved YOLOv5 models with attention modules are proposed. Moreover, ensemble learning and test time augmentation are adopted to improve model generalization and detection performance. The proposed method was evaluated through the IEEE Big Data Crowdsensing-based Road Damage Detection Challenge 2022. Different ensemble models achieved an average F1-score of 0.65177 on the five test datasets.