计算机科学
拳头
背景(考古学)
判别式
人工智能
任务(项目管理)
动作(物理)
模式识别(心理学)
生理学
量子力学
生物
物理
古生物学
经济
管理
作者
Yuchong Li,Tong Xia,Huoling Luo,Baochun He,Fucang Jia
标识
DOI:10.1109/jbhi.2023.3299321
摘要
Surgical action triplet recognition plays a significant role in helping surgeons facilitate scene analysis and decision-making in computer-assisted surgeries. Compared to traditional context-aware tasks such as phase recognition, surgical action triplets, comprising the instrument, verb, and target, can offer more comprehensive and detailed information. However, current triplet recognition methods fall short in distinguishing the fine-grained subclasses and disregard temporal correlation in action triplets. In this article, we propose a multi-task fine-grained spatial-temporal framework for surgical action triplet recognition named MT-FiST. The proposed method utilizes a multi-label mutual channel loss, which consists of diversity and discriminative components. This loss function decouples global task features into class-aligned features, enabling the learning of more local details from the surgical scene. The proposed framework utilizes partial shared-parameters LSTM units to capture temporal correlations between adjacent frames. We conducted experiments on the CholecT50 dataset proposed in the MICCAI 2021 Surgical Action Triplet Recognition Challenge. Our framework is evaluated on the private test set of the challenge to ensure fair comparisons. Our model apparently outperformed state-of-the-art models in instrument, verb, target, and action triplet recognition tasks, with mAPs of 82.1% (+4.6%), 51.5% (+4.0%), 45.50% (+7.8%), and 35.8% (+3.1%), respectively. The proposed MT-FiST boosts the recognition of surgical action triplets in a context-aware surgical assistant system, further solving multi-task recognition by effective temporal aggregation and fine-grained features.
科研通智能强力驱动
Strongly Powered by AbleSci AI