解密Prompt系列31. LLM Agent之從經驗中不斷學習的智慧體

风雨中的小七發表於2024-06-11

原文網址 : https://www.cnblogs.com/gogoSandy/p/18234318

Agent智慧體的工作流可以簡單分成兩種：一種是固定的靜態工作流，一種是智慧體自主決策的動態工作流。

靜態流程的Agent舉幾個例子，例如新聞熱點追蹤推送Agent，每日新論文摘要總結Agent，它們的優點是可控，穩定，可復現，缺點是一種流程基本只能固定適配一種場景，就像工廠的流水線。

而動態流程的Agent，也叫自主智慧體，例如AutoGPT，BabyAgent，它們自主感知環境，基於觀測進行決策，並做出行動，然後基於行動結果進行反思，並給出下一步行動。優點自然是'理論上'可以泛化到任意場景，不需要基於經驗的預置工作流的抽象，但缺點就是不可控，不穩定，不能復現，且任務完成率有限，尤其是在非通用的垂直領域。

影響自主智慧體在垂直領域任務完成率的有以下2個亟待解決的問題

模型自主能力進化：失敗是成功之母，模型該如何基於失敗的任務流進行反思和探索，一步步提高自己的任務完成率呢？
模型自主能力獲得：最初模型如何掌握該領域的技能，之前的方案多數是依賴SFT，透過人工，或者人工+模型來構建領域樣本來教會模型部分能力。說白了還是人手把手教模型。那這一步能否自主化讓模型在trial and error裡面自主進行學習呢？畢竟人類也是靠實驗和探索一步步掌握新的技能的。

其實以上兩個問題都可以透過Self-Reflection from past experience來解決，那問題就轉變成了如何獲得past-experience，past-experience如何轉化成經驗，如何在新的推理中使用這些經驗。這一章會介紹三個模型自主探索學習和經驗總結的方案分別是：AppAgent，Trial and Error和AutoGuide

APPAgent

AppAgent: Multimodal Agents as Smartphone Users

https://github.com/mnotgod96/AppAgent

APPAgent是騰訊實驗室推出出的和Andriod手機自主互動的智慧體，整體方案和上一章我們講過的WebVoyager的方案類似，使用多模態大模型和SOM頁面元素分割來識別每一步模型和頁面的哪些元素進行互動。而自主學習的部分，論文基於模型的前期自主探索，來構建工具說明書，幫助模型瞭解每款APP的使用，從而提高推理階段的任務完成率。這裡論文在9個android app上進行了測試，一些測試任務如下

那如何使用模型來自主生成APP操作說明書呢？類比人類在使用一個新工具時透過Trial and Error來不斷更新自己對工具的認知和使用方式，這裡的模型探索也是如此。論文先生成了一組基於APP的任務指令，然後基於每個指令模型會對APP的使用進行自主探索，每一步模型的輸入包括

手機互動的4種功能的功能介紹：包括點選、鍵入、長按、左右滑動
任務描述
歷史的互動行為的總結
當前手機應用頁面的截圖

每一步模型的輸出包括，如下圖

Thought：完成任務下一步做啥
Action：使用以上哪個功能的Function Calling或者FINISH
Summary:加入最新的Action，對所有歷史行為進行總結，作為下一步的輸入

APP操作的具體prompt（省略細節）如下：

self_explore_task_template = """You are an agent that is trained to complete certain tasks on a smartphone. You will be 
given a screenshot of a smartphone app. The interactive UI elements on the screenshot are labeled with numeric tags 
starting from 1. 

You can call the following functions to interact with those labeled elements to control the smartphone:

1. tap(element: int)
功能介紹。。。

2. text(text_input: str)
功能介紹。。。

3. long_press(element: int)
功能介紹。。。

4. swipe(element: int, direction: str, dist: str)
功能介紹。。。

The task you need to complete is to <task_description>. Your past actions to proceed with this task are summarized as 
follows: <last_act>
Now, given the following labeled screenshot, you need to think and call the function needed to proceed with the task. 
Your output should include three parts in the given format:
Observation: <Describe what you observe in the image>
Thought: <To complete the given task, what is the next step I should do>
Action: <The function call with the correct parameters to proceed with the task. If you believe the task is completed or 
there is nothing to be done, you should output FINISH. You cannot output anything else except a function call or FINISH 
in this field.>
Summary: <Summarize your past actions along with your latest action in one or two sentences. Do not include the numeric 
tag in your summary>
You can only take one action at a time, so please directly call the function."""

然後基於以上模型自主探索生成的行為序列，我們可以記錄每一步操作前後，收集頁面的變化，並基於變化讓大模型總結，該操作步驟究竟是幹什麼的，從而生成APP中每個按鈕的互動說明書。以下是點選類操作的說明書prompt，如果一個按鈕被使用多次，則模型會不斷更新該按鈕的說明文件。

tap_doc_template = """I will give you the screenshot of a mobile app before and after tapping the UI element labeled 
with the number <ui_element> on the screen. The numeric tag of each element is located at the center of the element. 
Tapping this UI element is a necessary part of proceeding with a larger task, which is to <task_desc>. Your task is to 
describe the functionality of the UI element concisely in one or two sentences. Notice that your description of the UI 
element should focus on the general function. For example, if the UI element is used to navigate to the chat window 
with John, your description should not include the name of the specific person. Just say: "Tapping this area will 
navigate the user to the chat window". Never include the numeric tag of the UI element in your description. You can use 
pronouns such as "the UI element" to refer to the element."""

這樣基於前期的模型探索我們可以得到每個APP上各種按鈕的一份操作說明書。然後在推理階段，模型會同時使用當前手機介面的UI+前期生成的說明書+5種互動行為說明+歷史操作，來生成下一步的互動操作。

論文驗證了，前期自主探索形成的說明書，對模型的任務完成準確率有很大的提升，幾乎可以逼近基於人工探索形成的說明書（Watching Demos），以及直接手工編寫說明書(Manually Crafted)的水平。

STE：Simulated Trial and Error

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

https://github.com/microsoft/simulated-trial-and-error

上面APPAgent幫助模型自我學習如何進行前端互動，微軟提出的STE是針對後端API互動，讓模型透過前期的多輪API互動學習API呼叫，並透過In-Context-Learning或者SFT使用前期探索的結果幫助模型更好的使用API來完成任務。

這裡STE使用了BmTools的API作為工具池，前期的工具探索階段分成以下3個步驟

Query生成：基於工具名稱和工具描述，讓模型生成一條能使用該API回答的問題。prompt指令如下

Your task is to answer the user's query as best you can. You have access to the following tools which you can use via API call to help with your response:

{api_descriptions}

Now you have the chance to explore the available APIs. You can do this by 1) synthesizing some natural user query that calling the API could help, and 2) trying to respond to the user query with the help of the APIs. Here, you can focus on queries that only require calling the API once.

Now, first input your synthesized user query. You should make the query natural - for example, try to avoid using the provided API descriptions or API names in the query, as the user does not know what APIs you have access to. Also try to make the query as specific as possible. Input just the user query alone; do NOT solve the query for now.

User Query:

工具呼叫推理：基於以上生成的Query+工具描述，模型使用ReACT正規化來生成工具呼叫語句。把推理語句解析成API呼叫後，呼叫API並獲取返回值，然後讓模型基於返回進行反思。這一步可以最多重複4次，直到模型判斷API呼叫結果可以回答使用者提問，並且每次都會使用之前N-1次的推理結果和觀測作為上文，也就是上圖中的Short-Memory部分，來幫助模型從錯誤中進行迭代和最佳化。這裡論文使用ChatGPT，prompt如下

Now, try to respond to the query using the available APIs.

The format you use the API is by specifying 1) Action: the API function name you'd like to call 2) Action Input: the input parameters of the API call in a json string format. The result of the API call will be returned starting with "Observation:". Remember that you should only perform a SINGLE action at a time, do NOT return a list of multiple actions.

Reminder:
1) the only values that should follow "Action:" are: {api_names}
2) use the following json string format for the API arguments:

Action Input:
{{
    "key_1": "value_1",
    ...
    "key_n": "value_n",
}}

Remember to ALWAYS use the following format:

Thought: you should always think about what to do next
Action: the API function name
Action Input: the input parameters of the API call in json string format
Observation: the return result of the API call. This is what I will provide you with; you do not need to repeat it in your response.
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the response to the user query

Begin! Remember that your response should never start with "Observation:" since that is what I will provide you with. Once you have enough information, please immediately use \nThought: I now know the final answer\nFinal Answer:

User Query (the same you just synthesized): {query}

擴充套件Query: 這裡論文針對每個API會進行15次query生成和嘗試，並且為了提高query的多樣性，在生成新query時會加入歷史已經生成的query和每個query模型是否成功呼叫工具完成。也就是上圖Long-Term Memory的部分，判斷query是否成功執行也是使用大模型prompt，這裡使用了能力更強的GPT4。

Now you know a bit more about the API. You can synthesize another user query to explore the API a bit further and consolidate your understanding of the API, based on things that you discovered about this API. Again, just input the user query alone; do NOT solve the query for now.

User Query:

每個API會重複15次以上的步驟2和步驟3，並記錄每一次嘗試的路徑，用於訓練或者後續的In-Context-Learning。

這裡我們只關注ICL的方案，因為泛化性更好，能更快擴充新工具和新場景。和上面APPAgent不同的，這裡的ICL不是使用前期探索生成的工具說明書，而是直接使用模型呼叫工具的歷史操作，類似於案例。 當使用者有新的提問時，會基於query的Embedding(SentenceBert），召回前期探索階段中最相似的15個query和最終模型的API呼叫結果作為推理上文，進行工具推理。

效果上論文對比了多個模型直接進行工具呼叫推理，使用前期探索的案例作為上文，和構建樣本進行SFT的效果。小模型還是需要微調才能獲得最高的任務完成率，但GPT4這類能力強的模型，只需要ICL就可以達到很好的任務完成率，以及不論是SFT還是ICL相比BaseLine都有很明顯的效果提升。

AutoGuide

AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents

對比AppAgent是把經驗在APP按鈕的操作級別進行總結形成工具說明書，推理時召回當前工具的說明書。STE是直接使用原始經驗，推理時召回相關歷史經驗作為上文，而AutoGuide則是透過對比成功和失敗的經驗在每一步的狀態級別進行總結，在推理時召回相關的狀態和狀態經驗作為上文。通俗點說AppAgent是使用說明書，STE是操作案例集，AutoGuide是使用指南。

想要構建並使用指南，AutoGuide包含三個核心模組：狀態總結模組（State Summarization），指南抽取模組(Guideline Extraction)，和指南召回模組。論文針對不同的Agent場景設計了不同的狀態總結和抽取prompt，這裡還是用我們上一章剛提過的webagent中的WebArena資料集為例，分別說下兩個模組

State Summarization

狀態總結模組是基於模型的規劃鏈路（Thought+Action）來總結模型處於的狀態。具體來說是基於同一個任務的成功和失敗的兩條行為鏈路，定位到兩個鏈路首次出現不同行為的時間節點T，使用"<T"的鏈路行為作為輸入，使用以下prompt進行狀態總結。

舉個例子，以下的任務中，兩條行為鏈路是在Action1的時候出現了差異，則會使用Action1之前的觀察和行為作為輸入(current trajectory) 進行狀態總結。這裡得到的狀態應該是"You are on the List of forum Page"

Guideline Extraction

得到狀態後則需要生成該狀態下的行為指南，這裡同樣分別用到成功和失敗的行為鏈路，以及前面的狀態總結，作為輸入來生成指南，具體prompt如下

同樣是上面的例子，針對狀態"You are on the List of forum Page"，以上prompt得到的指南是
"if you want to navigate to a specific forum, you can click on the link that exactly matches the forum name you are looking for."

在不斷基於state生成guideline的過程中，論文還會使用大模型prompt對相似的狀態進行合併，最終得到的是一個字典{state:guidelines}。以下是webArena場景中，最終生成的狀態指南示例

Apply Guideline at Test

基於以上獲取的狀態和狀態指南，在推理階段，每一步執行會先使用State Summarization模組對當前狀態進行總結，然後基於當前的狀態去構建好的狀態指南中先定位相似的狀態，這裡使用了和上面狀態消重合並相同的大模型prompt，然後基於定位到的狀態，獲取所有的相關指南。如果指南數量太多，則使用下面的prompt對指南進行篩選，只保留Top-K。然後基於這Top-K指南進行下一步思考和行為的推理。

其他相關論文

以上三篇論文覆蓋了當前自主學習的幾個大的方向，這個領域還有一些其他相關的論文，思路有些相似，感興趣的朋友可以自己看下

A Survey on Self-Evolution of Large Language Models
Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution
Empowering Large Language Model Agents through Action Learning
Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents
OS-COPILOT: TOWARDS GENERALIST COMPUTER AGENTS WITH SELF-IMPROVEMENT
LLAMA RIDER: SPURRING LARGE LANGUAGE MODELS TO EXPLORE THE OPEN WORLD
PAST AS A GUIDE: LEVERAGING RETROSPECTIVE LEARNING FOR PYTHON CODE COMPLETION
ExpeL: LLM Agents Are Experiential Learners

想看更全的大模型相關論文梳理·微調及預訓練資料和框架·AIGC應用，移步Github >> DecryPrompt

解密Prompt系列18. LLM Agent之只有智慧體的世界
2023-10-28
解密智慧體
解密Prompt系列30. LLM Agent之網際網路衝浪智慧體
2024-05-26
解密智慧體
解密Prompt系列28. LLM Agent之金融領域摸索：FinMem & FinAgent
2024-05-06
解密
解密prompt系列27. LLM對齊經驗之如何降低通用能力損失
2024-04-13
解密
解密Prompt系列16. LLM對齊經驗之資料越少越好？LTD & LIMA & AlpaGasus
2023-10-05
解密
解密Prompt系列21. LLM Agent之再談RAG的召回資訊密度和質量
2023-12-18
解密
解密Prompt系列20. LLM Agent之再談RAG的召回多樣性最佳化
2023-12-03
解密
解密Prompt系列29. LLM Agent之真實世界海量API解決方案：ToolLLM & AnyTool
2024-05-23
解密API
解密Prompt系列19. LLM Agent之資料分析領域的應用：Data-Copilot & InsightPilot
2024-11-15
解密
解密prompt系列43. LLM Self Critics
2024-11-25
解密
解密Prompt系列38.多Agent路由策略
2024-09-18
解密路由
解密prompt系列40. LLM推理scaling Law
2024-10-11
解密
解密Prompt系列32. LLM之表格理解任務-文字模態
2024-06-24
解密
解密Prompt系列33. LLM之圖表理解任務-多模態篇
2024-07-06
解密
解密prompt系列39. RAG之藉助LLM最佳化精排環節
2024-09-30
解密
解密prompt系列42. LLM通往動態複雜思維鏈之路
2024-11-15
解密
解密Prompt系列17. LLM對齊方案再升級 WizardLM & BackTranslation & SELF-ALIGN
2023-10-14
解密
LLM 大模型學習必知必會系列(十)：基於AgentFabric實現互動式智慧體應用,Agent實戰
2024-05-30
大模型智慧體
AI Agent框架（LLM Agent）：LLM驅動的智慧體如何引領行業變革，應用探索與未來展望
2024-07-05
AI框架智慧體行業
解密prompt系列41. GraphRAG真的是Silver Bullet？
2024-10-27
解密
解密Prompt系列3. 凍結LM微調Prompt: Prefix-Tuning & Prompt-Tuning & P-Tuning
2023-03-10
解密
解密prompt系列25. RLHF改良方案之樣本標註：RLAIF & SALMON
2024-03-25
解密AI
智慧體Agent
2024-04-21
智慧體
解密Prompt系列1. Tunning-Free Prompt：GPT2 & GPT3 & LAMA & AutoPrompt
2023-02-10
解密GPT
解密Prompt系列2. 凍結Prompt微調LM： T5 & PET & LM-BFF
2023-02-24
解密
大資料學習計劃【2019經典不斷更新】
2019-03-12
大資料
Prompt進階系列1:LangGPT(從程式語言反思LLM的結構化可複用提示設計框架)
2024-03-08
GPT框架
解密Prompt系列36. Prompt結構化編寫和最最佳化演算法UNIPROMPT
2024-08-19
解密演算法
PHP學習經驗分享，所有程式碼，外掛親測可用，時長關注，不斷更新...
2019-05-11
PHP
基於 LLM 的智慧運維 Agent 系統設計與實現
2024-11-19
運維
Flutter學習之入門和體驗
2019-03-12
Flutter
解密prompt系列34. RLHF之訓練另闢蹊徑：循序漸進 & 青出於藍
2024-07-23
解密
從別人的程式碼中學習golang系列--01
2020-07-02
Golang
從別人的程式碼中學習golang系列--03
2020-07-31
Golang
從別人的程式碼中學習golang系列--02
2020-07-03
Golang
中斷的學習筆記
2024-10-16
筆記
從零開始學習C++之if判斷語句
2024-08-22
C++
解密Prompt系列4. 升級Instruction Tuning：Flan/T0/InstructGPT/TKInstruct
2023-03-26
解密StructGPT

解密Prompt系列31. LLM Agent之從經驗中不斷學習的智慧體

APPAgent

STE：Simulated Trial and Error

AutoGuide

其他相關論文

相關文章