ChatCam: Embracing LLMs for Contextual Chatting-to-Camera with Interest-Oriented Video Summarization
自动汇总
计算机科学
多媒体
人工智能
作者
Kaijie Xiao,Yi Gao,Fu Li,Weifeng Xu,P. H. Chen,Weifeng Xu
出处
期刊:Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies [Association for Computing Machinery] 日期:2024-11-21卷期号:8 (4): 1-34
标识
DOI:10.1145/3699731
摘要
Cameras are ubiquitous in society, with users increasingly looking to extract insights about the physical world. Current human-to-camera interaction methods, while advanced, still need to support an intuitive, conversational interaction as one would expect in human-to-human communication. To achieve a more natural interaction between humans and cameras, we proposed a novel contextual chatting-to-camera paradigm. This paradigm allows users to interact with the camera using natural language including raising interests and questions. In response, the camera can customize specific tasks tailored to these interests and attempt to provide answers to the questions asked. We designed ChatCam, embracing LLMs for contextual chatting-to-camera with interest-oriented video summarization. With a novel prompt with the actor-critic LLMs approach, ChatCam can understand users' interests and translate them into some tasks and objects. ChatCam can also customize relevant models with the help of the multi-modal large language model and deep reinforcement learning on the resource-constrained edge and maintain high accuracy. Results show that ChatCam achieves an improvement up to 43.9% in understanding user interests and 21.1% in model accuracy compared to state-of-the-art methods in multiple settings. Various examples and the user study also prove the effectiveness of ChatCam in practice.