RGB颜色模型
人工智能
模式识别(心理学)
计算机科学
计算机视觉
变压器
卷积神经网络
工程类
电压
电气工程
作者
Wei He,Yang Mi,Xiangdong Ding,Gang Liu,Tao Li
标识
DOI:10.1016/j.compag.2023.107986
摘要
Automatic non-contact estimation of pig weight can avoid porcine stress and prevent the spread of swine fever. Many recent relevant works employ convolutional neural networks to extract deeply learned features for regressing pig weight based on single modality, either RGB images or depth images. However, utilizing only one modality may not be sufficient for pig-weight estimation, since both modalities are complementary for representing the spatial body information of pigs. In this paper, we propose a two-stream cross-attention vision Transformer for regressing pig weight based on both RGB and depth images. Specifically, we employ two separate Swin Transformer to extract texture appearance information and spatial structure information from RGB and depth images, respectively. Meanwhile, we design the cross-attention blocks to learn mutual-modal representations from both modalities. Finally, we construct a feature fusion layer to combine the features from both streams for regressing pig weight. In the experiments, we collect a new dataset of paired RGB-D pig images, which contains 10,263 RGB-D pairs for training and 5203 RGB-D pairs for testing. Comprehensive comparative experimental results show that the proposed method yields the best performance on this dataset, where the mean absolute error is 3.237.
科研通智能强力驱动
Strongly Powered by AbleSci AI