发布文献求助

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

过程（计算）计算机科学自动推理管理科学自然语言处理人工智能程序设计语言工程类

作者

Liangchen Luo,Zhiwen Yu,Rosanne Liu,Samrat Phatale,Harsh Lara,Yunxuan Li,Lei Shu,Yun Zhu,Lei Meng,Jiao Sun,Abhinav Rastogi

出处

期刊：Cornell University - arXiv 日期：2024-06-05 被引量：1

链接

arxiv.org arxiv.orgdoi.org

标识

DOI：10.48550/arxiv.2406.06592

摘要

Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a lengthy or multi-hop reasoning chain, where the intermediate outcomes are neither properly rewarded nor penalized. Process supervision addresses this limitation by assigning intermediate rewards during the reasoning process. To date, the methods used to collect process supervision data have relied on either human annotation or per-step Monte Carlo estimation, both prohibitively expensive to scale, thus hindering the broad application of this technique. In response to this challenge, we propose a novel divide-and-conquer style Monte Carlo Tree Search (MCTS) algorithm named \textit{OmegaPRM} for the efficient collection of high-quality process supervision data. This algorithm swiftly identifies the first error in the Chain of Thought (CoT) with binary search and balances the positive and negative examples, thereby ensuring both efficiency and quality. As a result, we are able to collect over 1.5 million process supervision annotations to train a Process Reward Model (PRM). Utilizing this fully automated process supervision alongside the weighted self-consistency algorithm, we have enhanced the instruction tuned Gemini Pro model's math reasoning performance, achieving a 69.4\% success rate on the MATH benchmark, a 36\% relative improvement from the 51\% base model performance. Additionally, the entire process operates without any human intervention, making our method both financially and computationally cost-effective compared to existing methods.

求助该文献

最长约 10秒，即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

更新

2024年影响因子查询已上线 (2024-6-20)

更新

大幅提高文件上传限制，最高150M (2024-4-1)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: Orange上传了应助文件

刚刚; 小白兔完成签到，获得积分10

刚刚; 科研通AI2S上传了应助文件

1秒前; 善学以致用上传了应助文件

1秒前; 迎风笑落红完成签到，获得积分10

2秒前; echo完成签到，获得积分10

3秒前; Owen上传了应助文件

3秒前; 谦让超短裙发布了新的文献求助10

3秒前; 清风完成签到，获得积分10

4秒前; 桐桐上传了应助文件

4秒前; 不语笑春风完成签到，获得积分20

5秒前; 斯文谷秋发布了新的文献求助10

5秒前; 不懂事的小孩发布了新的文献求助10

5秒前; 无鞅的应助被云儿多多采纳，获得10

6秒前; 积极的小馒头上传了应助文件

6秒前; 又一岁荣枯发布了新的文献求助10

6秒前; SciGPT的应助被liuzengzhang666采纳，获得10

7秒前; 缓慢冰菱发布了新的文献求助10

8秒前; 大个上传了应助文件

9秒前; 不语笑春风发布了新的文献求助10

9秒前; ding上传了应助文件

10秒前; wanci的应助被Snoopy采纳，获得30

11秒前; 科研通AI2.0上传了应助文件

11秒前; 不懂事的小孩完成签到，获得积分10

11秒前; organic tirrttf完成签到，获得积分10

11秒前; 88完成签到，获得积分10

12秒前; yan1875完成签到，获得积分10

13秒前; 夏来上传了应助文件

14秒前; 思源的应助被文静盼兰采纳，获得10

14秒前; 张远幸发布了新的文献求助10

14秒前; 斯文败类的应助被seven采纳，获得10

14秒前; 缓慢冰菱完成签到，获得积分10

14秒前; 594778089发布了新的文献求助10

16秒前; decademe发布了新的文献求助10

17秒前; CodeCraft的应助被liu采纳，获得10

20秒前; 科研通AI2.0上传了应助文件

21秒前; Owen的应助被wangfaqing942采纳，获得30

21秒前; 天天玩的应助被斯文谷秋采纳，获得30

22秒前; 研友_8oBxrZ完成签到，获得积分10

22秒前; orixero上传了应助文件

23秒前

高分求助中: rhetoric, logic and argumentation: a guide to student writers 1000; QMS18Ed2 | process management. 2nd ed 1000; One Man Talking: Selected Essays of Shao Xunmei, 1929–1939 1000; A Chronicle of Small Beer: The Memoirs of Nan Green 1000; From Rural China to the Ivy League: Reminiscences of Transformations in Modern Chinese History 900; Eric Dunning and the Sociology of Sport 850; The Cambridge Introduction to Intercultural Communication 700

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 2916547; 求助须知：如何正确求助？哪些是违规求助？ 2557126; 关于积分的说明 6916523; 捐赠科研通 2217141; 什么是DOI，文献DOI怎么找？ 1178458; 版权声明 588403; 科研通“疑难数据库（出版商）”最低求助积分说明 576742

今日热心研友

第六秒的鱼

热心市民小红花

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2024 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：826996720【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通