计算机科学
融合
人工智能
卷积神经网络
帧(网络)
事实上
深度学习
模式识别(心理学)
机器学习
政治学
语言学
电信
哲学
法学
作者
Stefan Petscharnig,Klaus Schöffmann,Jenny Benois‐Pineau,Souad Chaabouni,J. Keckstein
标识
DOI:10.1109/cbms.2018.00071
摘要
The most essential step towards semiautomatic extraction of relevant surgery scenes is semantic understanding of surgical actions in surgery videos. Currently, Convolutional Neural Networks (CNNs) are a de-facto standard for automatic content classification in many domain, including medical imaging. We aim to include increase the predictive performance of surgical action recognition within gynecologic laparoscopy, a subfield of endoscopic surgery, by fusing temporal information to the input layer of CNNs (early fusion), as well as temporal aggregation of single-frame prediction results (late fusion). Our evaluation shows that the proposed early fusion approaches are able to outperform a single-frame baseline when using the GoogLeNet architecture. Moreover, early fusion of motion information benefits the classification performance regardless of late fusion strategy. Late fusion has a high impact on classification performance, and its increase is additive to the performance increase of early fusion. Eventually, we found that the CNN capacity influences these results drastically. We conclude that the proposed methods in combination with a sufficiently high CNN capacity allow for a substantial increase in predictive performance.q
科研通智能强力驱动
Strongly Powered by AbleSci AI