发布文献求助

TOMGPT: Reliable Text-Only Training Approach for Cost-Effective Multi-modal Large Language Model

计算机科学情态动词培训（气象学）自然语言处理人工智能机器学习地理化学气象学高分子化学

作者

Yunkai Chen,Qimeng Wang,Shiwei Wu,Yan Gao,Tong Xu,Yao Hu

出处

期刊：ACM Transactions on Knowledge Discovery From Data [Association for Computing Machinery]
日期：2024-03-28 卷期号：18 (7): 1-19 被引量：5

链接

标识

DOI：10.1145/3654674

摘要

Multi-modal large language models (MLLMs), such as GPT-4, exhibit great comprehension capabilities on human instruction, as well as zero-shot ability on new downstream multi-modal tasks. To integrate the different modalities within a unified embedding space, previous MLLMs attempted to conduct visual instruction tuning with massive and high-quality image-text pair data, which requires substantial costs in data collection and training resources. In this article, we propose TOMGPT (Text-Only training Multi-modal GPT), a cost-effective MLLM tuned solely on easily accessible text data with much fewer resources. Along with pre-trained visual-linguistic coupled modality space (e.g., CLIP and ALIGN model), a text-only training strategy is devised to further project the aligned multi-modal latent space to that of LLM, endowing the LLM with visual comprehension capabilities in an efficient manner. Instead of enormous image-text training data required by previous MLLMs, we find that TOMGPT can be well-tuned with fewer yet diverse GPT-generated free-form text data, as we establish the semantic connection between LLM and pre-trained vision-language model. A quantitative evaluation is conducted on both MME and LVLM, which are recently released and extensively utilized MLLM benchmarks. The experiments reveal that TOMGPT achieved reliable performance compared to numerous models trained on a large amount of image-text pair data. Case studies are also presented, demonstrating TOMGPT’s broad understanding and dialogue capabilities across diverse image categories.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

更新

2024年影响因子查询已上线 (2024-6-20)

更新

大幅提高文件上传限制，最高150M (2024-4-1)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: NexusExplorer的应助被lvsehx采纳，获得10

1秒前; 科研通AI2S上传了应助文件

1秒前; 2011发布了新的文献求助10

1秒前; 脑洞疼上传了应助文件

2秒前; 杳鸢的应助被lllllll采纳，获得20

3秒前; 耶格尔发布了新的文献求助10

4秒前; 我是老大的应助被HEANZ采纳，获得10

4秒前; 尊敬曼岚发布了新的文献求助10

4秒前; 鱼蛋超人发布了新的文献求助10

4秒前; 打打上传了应助文件

5秒前; Owen的应助被千颂采纳，获得30

5秒前; 杰杰子完成签到，获得积分10

5秒前; 花粉过敏发布了新的文献求助10

6秒前; 朴素的啤酒的应助被han采纳，获得10

6秒前; yangyangyang完成签到，获得积分10

6秒前; 卓然不凡发布了新的文献求助10

6秒前; 李健的粉丝团团长上传了应助文件

6秒前; 慕青上传了应助文件

7秒前; 在水一方的应助被微笑的语芙采纳，获得10

7秒前; Xenia发布了新的文献求助10

8秒前; FashionBoy的应助被想睡觉的面包酱采纳，获得10

8秒前; 稀有人类完成签到，获得积分20

8秒前; Endlessway上传了应助文件

8秒前; yangyangyang发布了新的文献求助30

9秒前; pcy发布了新的文献求助10

10秒前; 心台的应助被萱萱采纳，获得10

11秒前; Ava上传了应助文件

12秒前; 田様的应助被无辜不言采纳，获得10

12秒前; wanci上传了应助文件

13秒前; Lucas上传了应助文件

13秒前; 不上课不行完成签到，获得积分10

13秒前; 摸鱼冲浪手发布了新的文献求助20

13秒前; 在水一方上传了应助文件

14秒前; 剪短发不短发布了新的文献求助10

14秒前; JamesPei的应助被江竹兰采纳，获得10

14秒前; Xenia完成签到，获得积分10

15秒前; 杳鸢的应助被lllllll采纳，获得20

15秒前; 我是老大上传了应助文件

15秒前; 传奇3上传了应助文件

15秒前; 小蘑菇上传了应助文件

15秒前

高分求助中: 求国内可以测试或购买Loschmidt cell（或相同原理器件）的机构信息 1000; The Heath Anthology of American Literature: Early Nineteenth Century 1800 - 1865 Vol. B 500; A new species of Velataspis (Hemiptera Coccoidea Diaspididae) from tea in Assam 500; Machine Learning for Polymer Informatics 500; 《关于整治突出dupin问题的实施意见》（厅字〔2019〕52号） 500; 2024 Medicinal Chemistry Reviews 480; Women in Power in Post-Communist Parliaments 450

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 3218980; 求助须知：如何正确求助？哪些是违规求助？ 2867998; 关于积分的说明 8159022; 捐赠科研通 2535031; 什么是DOI，文献DOI怎么找？ 1367402; 科研通“疑难数据库（出版商）”最低求助积分说明 645052; 邀请新用户注册赠送积分活动 618233

今日热心研友

坚强的广山

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2025 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：941272744【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通