Abstract In this study, a double‐channel convolutional neural network and weighted (DCW) transformer model is proposed to address the problem of insufficient extraction of local information and no attention to channel‐step information in the traditional transformer model. First, a double‐channel information extraction method is proposed, so both the channel‐step and time‐step information achieve attention; second, the local information in the time and channel dimension of the data is extracted from deep and multiple scales, improving the feature extraction capability for local information; third, the long distance dependency relationship of the data is preserved by the attention mechanism, hence, the global correlation of the data is extracted effectively; finally, using the Gumbel‐SoftMax function, the weights of the time‐step and channel‐step feature information are assigned, so the extracted feature information has been optimized. The proposed method was applied for the penicillin fermentation process to verify its efficacy. Experimental results show that the proposed method achieved a better fault detection accuracy, outperforming the existing models. Further ablation experiments were conducted to demonstrate the effectiveness of each component of the proposed model.