情态动词
计算机科学
人工智能
召回
机器学习
过程(计算)
集合(抽象数据类型)
自然语言处理
语言学
哲学
化学
高分子化学
程序设计语言
操作系统
作者
Shivani Gowda,Yifan Hu,Mandy Korpusik
标识
DOI:10.1109/icassp49357.2023.10095762
摘要
In this paper, we present multi-modal approaches to diet tracking. As health and well-being become increasingly important, mobile applications for diet tracking attract much interest. However, these applications often require users to log their meals based on relatively unreliable memory recall, thereby underestimating nutritional intake and, thus, undermining the efforts of nutrition tracking. To accurately record dietary intake, there is an increasing need for image computational methods. We investigated multi-modal transfer learning approaches on a novel, food-specific image-text dataset, specifically a Vision-and-Language Transformer that achieves a held-out test set Micro-F1 score of 77.70% and Macro-F1 score of 51.43% for 696 food categories. We aim to give other researchers new insight into the process of developing domain-specific, multi-modal deep learning models with small datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI