Pipe caps play an essential role in the suspension device of railway catenary. Their operational status has an impact on both the safety of train movement and the stability of the catenary power supply. In the high-resolution catenary monitoring images, components are small and the defective samples are few, which makes it difficult to detect. To solve these problems, we propose a two-stage cascade network to detect pipe cap defects, and use WGAN-GP network to generate defect samples. First, we preprocess the blurred image, the irrelevant background image, and the image with abnormal brightness. The YOLOv5 object detection algorithm obtains the pipe cap region from complex backgrounds. Second, we use the WGANGP algorithm to generate pipe cap defect images and merge the data set with the original defect data set. We use the consolidated dataset to train the ResNet50 network, which can detect pipe cap defects. The experimental results verify the adaptability and effectiveness of the proposed method under complex background.