圖 3 我們研究了 M * 效能如何隨著步驟級別候選數量的變化而變化。我們選擇 Llama-2-13B 作為基礎模型,並分別選擇束搜尋(BS)作為搜尋演算法。
圖 4 Llama-2 和 Llama-3 模型家族在 MATH 資料集上的尺度定律。所有結果均來自它們的原始資源。我們使用 Scipy 工具和對數函式來計算擬合曲線。
表 2 不同方法在回答問題時的平均 token 生產數量 4. 結論 本文介紹了 MindStar(M*),一種新穎的基於搜尋的推理框架,用於增強預訓練大型語言模型的推理能力。透過將推理任務視為搜尋問題並利用過程監督的獎勵模型,M* 在推理樹空間中有效導航,識別近似最優路徑。結合束搜尋和 Levin 樹搜尋的思想,進一步增強了搜尋效率,並保證在有限計算複雜度內找到最佳推理路徑。廣泛的實驗結果表明,M* 顯著提升了開源模型的推理能力,其表現可與更大規模的閉源模型媲美,同時大幅減少了模型規模和計算成本。 這些研究成果表明,將計算資源從微調轉移到推理時間搜尋具有巨大的潛力,為未來高效推理增強技術的研究開闢了新途徑。 參考文獻:[1] Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.[2] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022. [3] Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, and Daxin Jiang. Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568, 2023.[4] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.[5] Carlos Gómez-Rodríguez and Paul Williams. A confederacy of models: A comprehensive evaluation of llms on creative writing. arXiv preprint arXiv:2310.08433, 2023.[6] Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T Kwok, Zhenguo Li, Adrian Weller, and Weiyang Liu. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284, 2023.[7] Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, YK Li, Y Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024.[8] Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, and Jimmy Ba. Openwebmath: An open dataset of high-quality mathematical web text. arXiv preprint arXiv:2310.06786, 2023.[9] Peiyi Wang, Lei Li, Zhihong Shao, RX Xu, Damai Dai, Yifei Li, Deli Chen, Y Wu, and Zhifang Sui. Math-shepherd: Verify and reinforce llms step-by-step without human annotations. CoRR, abs/2312.08935, 2023.[10] Meta AI. Introducing meta llama 3: The most capable openly available llm to date, April 2024. URL https://ai.meta.com/blog/meta-llama-3/. Accessed: 2024-04-30.