奥林匹克运动会
成对比较
浆果
计算机科学
数学
数学优化
人工智能
数学教育
生物
植物
作者
Di Zhang,Jianbo Wu,Jingdi Lei,Tong Che,Jiatong Li,Xie Tong,Xiaoshui Huang,Shufei Zhang,Marco Pavone,Yuqiang Li,Wanli Ouyang,D.C. Zhou
出处
期刊:Cornell University - arXiv
日期:2024-10-03
标识
DOI:10.48550/arxiv.2410.02884
摘要
This paper presents an advanced mathematical problem-solving framework, LLaMA-Berry, for enhancing the mathematical reasoning ability of Large Language Models (LLMs). The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path and utilizes a pairwise reward model to evaluate different paths globally. By leveraging the self-critic and rewriting capabilities of LLMs, Self-Refine applied to MCTS (SR-MCTS) overcomes the inefficiencies and limitations of conventional step-wise and greedy search algorithms by fostering a more efficient exploration of solution spaces. Pairwise Preference Reward Model~(PPRM), inspired by Reinforcement Learning from Human Feedback (RLHF), is then used to model pairwise preferences between solutions, utilizing an Enhanced Borda Count (EBC) method to synthesize these preferences into a global ranking score to find better answers. This approach addresses the challenges of scoring variability and non-independent distributions in mathematical reasoning tasks. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability compared to existing methods like ToT and rStar, particularly in complex Olympiad-level benchmarks, including GPQA, AIME24 and AMC23.
科研通智能强力驱动
Strongly Powered by AbleSci AI