计算机科学
人工智能
变压器
特征(语言学)
机器人
帧(网络)
特征向量
回归
手势
模式识别(心理学)
机器学习
统计
数学
电信
语言学
哲学
物理
量子力学
电压
作者
Dimitrios Anastasiou,Yueming Jin,Danail Stoyanov,Evangelos B. Mazomenos
出处
期刊:IEEE robotics and automation letters
日期:2023-02-06
卷期号:8 (3): 1755-1762
被引量:12
标识
DOI:10.1109/lra.2023.3242466
摘要
This letter proposes a novel video-based, contrastive regression architecture, Contra-Sformer, for automated surgical skill assessment in robot-assisted surgery. The proposed framework is structured to capture the differences in the surgical performance, between a test video and a reference video which represents optimal surgical execution. A feature extractor combining a spatial component (ResNet-18), supervised on frame-level with gesture labels, and a temporal component (TCN), generates spatio-temporal feature matrices of the test and reference videos. These are then fed into an action-aware Transformer with multi-head attention that produces inter-video contrastive features at frame level, representative of the skill similarity/deviation between the two videos. Moments of sub-optimal performance can be identified and temporally localized in the obtained feature vectors, which are ultimately used to regress the manually assigned skill scores. Validated on the JIGSAWS dataset, Contra-Sformer achieves competitive performance (Spearman 0.65–0.89), with a normalized mean absolute error between 5.8%-13.4% on all tasks and across validation setups.
科研通智能强力驱动
Strongly Powered by AbleSci AI