分割
人工智能
卷积神经网络
计算机科学
特征(语言学)
计算机视觉
模式识别(心理学)
目标检测
特征提取
哲学
语言学
作者
Xiaoqiang Du,Zhichao Meng,Zenghong Ma,Lijun Zhao,Wenwu Lu,Hongchao Cheng,Y. Wang
标识
DOI:10.1016/j.biosystemseng.2023.12.017
摘要
The tomato picking robot's vision system faces two difficult tasks: precise tomato pose acquisition and stem location. Tomato pose and stem location can help determine the end effector pose and achieve collision-free picking. To realise efficient crop picking, the tasks of target location, pose detection, and obstacle semantic segmentation should be completed in one model to obtain comprehensive visual information. Therefore, the multitask convolutional neural network YOLO-MCNN is proposed, a new method to complete the above tasks in one model. By fusing multi-scale features and determining the optimal locations for the semantic segmentation branch, four strategies are proposed for enhancing the segmentation ability. The experiment results show that fusing the semantic segmentation branch with the second layer of shallow feature maps and placing the branch after the 17th layer can result in the best segmentation performance. Fusing shallow feature maps improves small target detection while merging multi-scale feature maps enhances semantic segmentation performance. Moreover, ablation experiments are conducted to understand the influence between multitask convolutional and single task networks. It proves that running multiple tasks on the same backbone network does not affect their performance. The YOLO-MCNN's target detection performance F1 is 87.8%, the semantic segmentation performance mIoU is 74.8%, the keypoint detection performance dlmk is 6.95 pixels, the network size is 15.2 MB, and the inference speed is 19.9ms. Compared with other target detection and semantic segmentation networks, it shows that the comprehensive performance of the YOLO-MCNN is the best. The method provides theoretical foundation for constructing multitask convolutional neural networks.
科研通智能强力驱动
Strongly Powered by AbleSci AI