Spiking ViT: spiking neural networks with transformer—attention for steel surface defect classification

尖峰神经网络人工神经网络计算机科学人工智能模式识别（心理学）分类编码器变压器电压工程类电气工程操作系统

作者

Liang Gong,Hang Dong,Xinyu Zhang,Xin Cheng,Fan Ye,Liangchao Guo,Zhenghui Ge

出处

期刊：Journal of Electronic Imaging [SPIE]
日期：2024-05-02 卷期号：33 (03) 被引量：5

标识

DOI：10.1117/1.jei.33.3.033001

摘要

Throughout the steel production process, a variety of surface defects inevitably occur. These defects can impair the quality of steel products and reduce manufacturing efficiency. Therefore, it is crucial to study and categorize the multiple defects on the surface of steel strips. Vision transformer (ViT) is a unique neural network model based on a self-attention mechanism that is widely used in many different disciplines. Conventional ViT ignores the specifics of brain signaling and instead uses activation functions to simulate genuine neurons. One of the fundamental building blocks of a spiking neural network is leaky integration and fire (LIF), which has biodynamic characteristics akin to those of a genuine neuron. LIF neurons work in an event-driven manner such that higher performance can be achieved with less power. The goal of this work is to integrate ViT and LIF neurons to build and train an end-to-end hybrid network architecture, spiking vision transformer (S-ViT), for the classification of steel surface defects. The framework relies on the ViT architecture by replacing the activation functions used in ViT with LIF neurons, constructing a global spike feature fusion module spiking transformer encoder as well as a spiking-MLP classification head for implementing the classification functionality and using it as a basic building block of S-ViT. Based on the experimental results, our method has demonstrated outstanding classification performance across all metrics. The overall test accuracies of S-ViT are 99.41%, 99.65%, 99.54%, and 99.77% on NEU-CLSs, and 95.70%, 95.93%, 96.94%, and 97.19% on XSDD. S-ViT achieves superior classification performance compared to convolutional neural networks and recent findings. Its performance is also improved relative to the original ViT model. Furthermore, the robustness test results of S-ViT show that S-ViT still maintains reliable accuracy when recognizing images that contain Gaussian noise.

求助该文献

最长约 10秒，即可获得该文献文件

Spiking ViT: spiking neural networks with transformer—attention for steel surface defect classification

今日热心研友