OpenAI o1式思維鏈,開源模型也可以有,成功案例來了

机器之心發表於2024-09-17

北京時間 9 月 13 日午夜,OpenAI 釋出了推理效能強大的 ο1 系列模型。之後,各路研究者一直在嘗試挖掘 ο1 卓越效能背後的技術並嘗試復現它。當然,OpenAI 也想了一些方法來抑制窺探,比如有多名使用者聲稱曾試圖誘導 ο1 模型公佈其思維過程,然後收到了 OpenAI 的封號威脅。

圖片

儘管如此,不過三四天時間,就已經有研究者宣稱已經成功復現/開發出了與 ο1 效能差不多的推理技術,並且還不止一個!

Llamaberry:教會 AI 像聰明人一樣思考

Llamaberry 的提出者是 Martin Bowling。他開發的專案包括 RAGMiner.dev 和 Replit;其中前者可以幫助使用者毫不費力地將網站轉換成 Markdown、XML 或 JSON 等格式以便 RAG 和 LLM 應用使用,而後者則是一個使用 AI 將想法變成程式碼的專案。

Llamaberry 的核心思路是使用思維鏈(CoT)來實現推理。這個名字自然源自代表 o1 模型的 Strawberry(草莓)。

HuggingFace地址:https://huggingface.co/spaces/martinbowling/Llamaberry

什麼是思維鏈?Bowling 在部落格中打了個比方:「思維鏈推理就像是給 AI 一個筆記本來展示其工作過程。其中不僅僅是簡單地給出答案,而是會帶領我們經歷其思維過程。」

Llamaberry 能教會 AI 透徹地思考,就像是一位人類專家攻克難題時那樣。

具體來說,Llamaberry 是一個多輪思維鏈推理系統的實現,其基於執行在 Groq 上的 Llama 3.1 70B 模型。

多輪推理是關鍵

多輪推理,顧名思義,就是讓模型在給出答案之前進行多步思考,而不是一步給出答案。打個比方,這就像是看一位大廚從備菜到完成擺盤一步步地完成一道精美菜餚,而不是直接微波加熱預製菜。

舉個示例:

第 1 輪:AI 先嚐試解決當前問題。

第 2 輪:AI 回顧第一次嘗試並盡力改進或最佳化其思維過程。

第 3 輪:再進行一輪反思和改進。

綜合結果:最後,將所有這些思考綜合到一起,得到一個連貫且合理的答案。

下面展示了這個多輪過程的示意圖:

圖片

可以看到,前一輪的輸出會成為後一輪的輸入,從而讓 AI 可在每個階段不斷完善其思維。最後,所有這些思考會凝練成一個合理的最終答案。就像看著一枚莓果逐漸成熟!

如何實現

下面將深入 Llamaberry 的實現細節。

1.設定舞臺

首先,我們需要為 AI 助手設定一些基本規則,程式碼如下:

initial_system_prompt = """You are an AI assistant capable of detailed, step-by-step thinking. When presented with a question or problem, break down your thought process into clear, logical steps. For each step, explain your reasoning. Conclude with a final answer. Use the following markdown structure:

## Reasoning
1. [First step]
   **Explanation:** [Detailed explanation of this step]
2. [Second step]
   **Explanation:** [Detailed explanation of this step]
...

## Answer
[Final answer]

Be comprehensive and show your reasoning clearly."""

這就是提供給 AI 大廚的菜譜。它知道需要逐步分解其思考過程並解釋每個步驟,並且以 Markdown 格式將它們顯示出來。

2.思考過程

在每一輪推理中,都需要讓 AI 對問題進行思考。但在第一輪結束後,還需要求它思考之前已經思考過的東西。這就像問朋友,「嘿,還記得你之前說過什麼嗎?讓我們再想一想。」

下面是每一輪的生成方式:

async def generate_turn(query: str, previous_turns: list = None) -> str:
    is_first_turn = previous_turns is None or len(previous_turns) == 0
if is_first_turn:
        messages = [{
            "role": "system",
            "content": initial_system_prompt
        }, {
            "role": "user",
            "content": query
        }]
    else:
        previous_content = "\n\n".join(previous_turns)
        messages = [{
            "role": "system",
            "content": followup_system_prompt
        }, {
            "role":
            "user",
            "content":
            f"Original Query: {query}\n\nPrevious Turns:\n{previous_content}\n\nProvide the next turn of reasoning."
        }]

    return await call_llm(messages)

該函式是多輪推理的核心。它決定我們是在第一輪還是在後續輪,並以此構建提示詞。對於後續輪,它會將之前所有推理輪都包含在上下文中,讓 AI 可以基於之前的思考進行構建。

3.綜合結果

經過三輪思考之後,再讓 AI 檢視其所有思考並得出一個合理的最終答案。這就像是讓專家討論問題,然後編寫最終報告。

async def synthesize_turns(query: str, turns: list) -> str:
    turns_text = "\n\n".join(
        [f"Turn {i+1}:\n{turn}" for i, turn in enumerate(turns)])
    messages = [{
        "role": "system",
        "content": synthesis_prompt
    }, {
        "role":
        "user",
        "content":
        f"Original Query: {query}\n\nTurns of Reasoning:\n{turns_text}"
    }]
    return await call_llm(messages)

這個綜合步驟至關重要,是 AI 綜合所有思考內容凝練出最終答案的一步。

單輪推理 vs 多輪推理

Martin Bowling 在其部落格中簡單比較了單輪和多輪推理。

這裡他採用了一個簡單示例,即詢問 AI:「人工智慧對就業市場有什麼潛在影響?」

單輪推理:

AI: Artificial intelligence will likely have significant impacts on the job market, including:
1. Automation of routine tasks
2. Creation of new job roles in AI development and maintenance
3. Potential job displacement in certain industries
4. Increased demand for skills in data analysis and machine learning

While AI may eliminate some jobs, it will also create new opportunities and increase productivity in many fields.

使用 Llamaberry 執行多輪推理:

第 1 輪:

AI: Let's approach this step-by-step:

1. Identify current AI capabilities
   Explanation: AI can perform data analysis, pattern recognition, and automate routine tasks.

2. Analyze vulnerable job sectors
   Explanation: Manufacturing, customer service, and data entry are likely to be affected.

3. Consider new job opportunities
   Explanation: AI development, data science, and AI ethics consultation may see growth.

4. Examine potential economic effects
   Explanation: Increased productivity might lead to economic growth, but wealth distribution could be a challenge.

Initial Answer: AI will likely cause significant shifts in the job market, with both job losses in some sectors and new opportunities in others.

第 2 輪:

AI: Let's refine our thinking:

1. Reassess the timeline of AI impact
   Explanation: The effects won't be immediate but will unfold over years or decades.

2. Consider the adaptability of the workforce
   Explanation: Retraining and education will be crucial for workers to transition to new roles.

3. Analyze potential societal changes
   Explanation: The nature of work itself might change, with more emphasis on creativity and emotional intelligence.

4. Examine global differences
   Explanation: AI's impact will vary across different countries and economies.

Refined Answer: AI's impact on the job market will be complex and gradual, requiring proactive measures in education and economic policy to ensure a smooth transition.

第 3 輪:

AI: Let's delve deeper:

1. Explore the concept of human-AI collaboration
   Explanation: Many jobs may evolve to involve working alongside AI rather than being replaced by it.

2. Consider the ethical implications
   Explanation: Issues like algorithmic bias in hiring processes need to be addressed.

3. Analyze the potential for new economic models
   Explanation: Universal Basic Income or reduced working hours might become more prominent discussions.

4. Examine historical parallels
   Explanation: Previous technological revolutions, like the Industrial Revolution, can offer insights into potential outcomes.

Further Refined Answer: While AI will undoubtedly transform the job market, its impact will likely be nuanced, involving collaboration between humans and AI, ethical considerations, and potentially new economic paradigms.

綜合答案:

AI: After careful consideration across multiple turns of reasoning, we can synthesize our thoughts on the potential impacts of artificial intelligence on the job market:

1. Job Transformation: AI will lead to significant changes in the job market, but these changes will be gradual and complex. While some jobs may be automated, many others will be transformed to involve collaboration between humans and AI. Industries like manufacturing, customer service, and data entry are likely to see the most immediate impacts.

2. New Opportunities: The rise of AI will create new job categories, particularly in fields like AI development, data science, AI ethics consultation, and human-AI interaction design. There will likely be an increased demand for skills in creativity, emotional intelligence, and complex problem-solving that AI cannot easily replicate.

3. Economic and Social Implications: The integration of AI into the workforce has the potential to increase productivity and economic growth. However, it also raises concerns about wealth distribution and may necessitate discussions about new economic models, such as Universal Basic Income or reduced working hours.

4. Education and Adaptation: The changing job market will require a strong emphasis on continuous learning and adaptation. Educational systems and workforce training programs will need to evolve to prepare people for the new reality of working alongside AI.

5. Ethical Considerations: As AI becomes more prevalent in hiring and work processes, addressing issues like algorithmic bias and ensuring fair opportunities for all workers will be crucial.

6. Global Variations: The impact of AI on the job market will not be uniform across the globe. Different countries and economies will experience varying effects based on their economic structures, technological readiness, and policy responses.

7. Historical Context: While AI represents a significant shift, it's important to view it in the context of previous technological revolutions. Like the Industrial Revolution, the AI revolution will likely lead to short-term disruptions but potentially long-term benefits if managed properly.

In conclusion, while AI will undoubtedly have profound impacts on the job market, these impacts will be multifaceted and evolve over time. The key to navigating this transition successfully lies in proactive measures in education, economic policy, and ethical governance, ensuring that the benefits of AI are broadly shared while mitigating potential negative effects.

可以看到,多輪方法能實現更加精細、全面的分析。這能從多個角度考慮問題,最佳化其思考,最後得到更加深思熟慮和全面詳細的答案。

Bowling 表示,這樣的多輪推理在思考深度、自我提升、透明度、靈活性等方面都有巨大優勢,同時其還採用了 Markdown 格式,使其結果更容易閱讀和用於後續流程。

因此,這種技術可讓 AI 進入更多實際應用場景,比如醫療診斷、法律分析、科學研究、商業策略、教育。當然,在使用 Llamaberry 等多輪推理系統時,也需要注意它們的侷限和潛在的道德倫理問題,比如偏見、計算成本、過度依賴和隱私問題。

如何使用 Llamaberry?

Llamaberry 也很容易使用,點選幾下就能擁有你自己的多輪推理系統。步驟如下:

  1. 前往 Replit,點選該連結獲取 Llamaberry 模板:https://replit.com/@MartinBowling/Llamaberry-Powered-By-Groq?v=1

  2. 建立模板分支:點選 Fork 按鈕建立你自己的 Llamaberry 專案副本。

  3. 獲取你的 Groq API Key:註冊 Groq 賬戶,獲取 API Key。

  4. 設定環境:在你的分支 Replit 專案中,找到「Secrets」選項卡。新增一個新金鑰,金鑰為 GROQ_API_KEY,值是你的 Groq API 金鑰。

  5. 執行專案:單擊 Replit 介面頂部的 Run 按鈕。這將啟動 Llamaberry 應用。

  6. 開始實驗:應用執行起來後,你將看到一個 Gradio 介面。你可以在其中輸入問題並檢視 Llamaberry 多輪推理的實際效果!並且輸出是簡潔漂亮的 Markdown 格式!

瞭解了 Llamaberry,下面來看另一個號稱實現了類 o1 推理鏈的專案:g1。

g1:實現類似 ο1 的推理鏈

g1 這個專案來自 Benjamin Klieger,他是 Groq 的一位研究者。也因此,g1 同樣基於 Groq,並且其也使用了 Llama 3.1 70b 模型。

不同於 Llamaberry 使用的多輪思維鏈推理,g1 的策略是角色扮演、思維鏈提示 、格式化以及另一些提示技巧。並且,g1 開源了。

專案地址:https://github.com/bklieger-groq/g1

圖片

開發者宣稱 g1 有 70% 的時間能成功數出 Strawberry 中有多少個 R,同時無需任何微調或少樣本技術。下面是其一次執行過程:

圖片

開發者 Klieger 表示,g1 和 ο1 一樣能讓 LLM 有能力「思考」和解決之前的領先模型難以應對的邏輯問題。但不同之處在於,g1 會大方地展示所有推理 token。同時,他也強調了 g1 和 ο1 在技術上的差異,其中後者使用了大規模強化學習來執行思維鏈推理。而 g1 則是透過發掘提示詞工程的潛力來幫助 LLM 解決簡單的邏輯問題,讓現有的開源模型也能受益於動態推理鏈和最佳化般的探索介面。

g1 的工作方式

由 Llama 3.1 70b 支援的 g1 會建立一種動態的思維鏈。

在每個步驟中,LLM 可以選擇是繼續進行另一個推理步驟,還是提供最終答案。每個步驟都有標題,並且對使用者可見。

系統提示詞中還會包含給 LLM 的提示。其提示策略如下:

You are an expert AI assistant that explains your reasoning step by step. For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer. Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys. USE AS MANY REASONING STEPS AS POSSIBLE. AT LEAST 3. BE AWARE OF YOUR LIMITATIONS AS AN LLM AND WHAT YOU CAN AND CANNOT DO. IN YOUR REASONING, INCLUDE EXPLORATION OF ALTERNATIVE ANSWERS. CONSIDER YOU MAY BE WRONG, AND IF YOU ARE WRONG IN YOUR REASONING, WHERE IT WOULD BE. FULLY TEST ALL OTHER POSSIBILITIES. YOU CAN BE WRONG. WHEN YOU SAY YOU ARE RE-EXAMINING, ACTUALLY RE-EXAMINE, AND USE ANOTHER APPROACH TO DO SO. DO NOT JUST SAY YOU ARE RE-EXAMINING. USE AT LEAST 3 METHODS TO DERIVE THE ANSWER. USE BEST PRACTICES.

Example of a valid JSON response:
json
{
    "title": "Identifying Key Information",
    "content": "To begin solving this problem, we need to carefully examine the given information and identify the crucial elements that will guide our solution process. This involves...",
    "next_action": "continue"
}

對這些提示詞的詳細解釋請參閱原專案的 Prompt Breakdown 一節。這裡就不贅述了,僅給出幾個示例,比如可以在提示詞中加入「include exploration of alternative answers」(探索其它答案)和「use at least 3 methods to derive the answer」(使用至少三種方法來得出答案)。

這樣一來,透過組合思維鏈以及嘗試多種方法、探索其它答案、質疑之前草擬的解答、考慮 LLM 的侷限性等策略,就能顯著提升 LLM 的推理能力。

在數 Strawberry 中有多少個 R 這個經典問題上,無需任何訓練,g1 就能幫助 Llama 3.1 70b 達到約 70% 的準確度(n=10, How many Rs are in strawberry?)。而如果不使用提示技術,Llama 3.1 70b 的準確率為 0%,ChatGPT-4o 的也只有 30%。

下面展示了另一個示例:0.9 和 0.11 哪個更大?

圖片

詳細的安裝過程和程式碼請參閱原專案。

最後,順便一提,另有開發者釋出了 g1 的分支版 Mult1,該版本的一大改進是可使用多個 AI 提供商來建立類似 o1 的推理鏈,感興趣的讀者可訪問:https://github.com/tcsenpai/multi1

相關文章