MNIST数据库
计算机科学
人工智能
模态(人机交互)
尖峰神经网络
模式
情态动词
人工神经网络
模式识别(心理学)
机器学习
语音识别
社会科学
社会学
化学
高分子化学
作者
Nitin Rathi,Kaushik Roy
标识
DOI:10.1109/tetci.2018.2872014
摘要
Spiking neural networks perform reasonably well in recognition applications for single modality (e.g., images, audio, or text). In this paper, we propose a multimodal spiking neural network that combines two modalities (image and audio). The two unimodal ensembles are connected with cross-modal connections and the entire network is trained with unsupervised learning. The network receives inputs in both modalities for the same class and predicts the class label. The excitatory connections in the unimodal ensemble and the cross-modal connections are trained with power-law weight-dependent spike timing dependent plasticity learning rule. The cross-modal connections capture the correlation between neurons of different modalities. The multimodal network learns features of both modalities and improves the classification accuracy compared to unimodal topology, even when one of the modality is distorted by noise. The cross-modal connections suppress the effect of noise on classification accuracy. The well-learned cross-modal connections invoke additional spiking activity in neurons of the correct label. The cross-modal connections are only excitatory and do not inhibit the normal activity of the unimodal ensembles. We evaluated our multimodal network on images from MNIST dataset and utterances of digits from TI46 speech corpus. The multimodal network achieved a classification accuracy of 98% on the combined MNIST and TI46 dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI