最佳 AI 翻譯工作流:全世界最信達雅的翻譯

米开朗基杨發表於2024-07-23

吳恩達老師提出了一種反思翻譯的大語言模型 (LLM) AI 翻譯工作流程——GitHub - andrewyng/translation-agent,具體工作流程如下:

  1. 提示一個 LLM 將文字從 source_language 翻譯到 target_language
  2. 讓 LLM 反思翻譯結果並提出建設性的改進建議;
  3. 使用這些建議來改進翻譯。

這個 AI 翻譯流程是目前比較新的一種翻譯方式,利用 LLM 對自己的翻譯結果進行改進來獲得較好的 AI 翻譯效果。

專案中展示了可以利用對長文字進行分片,然後分別進行反思翻譯處理,以突破 LLM 對 tokens 數量的限制,真正實現長文字一鍵高效率高質量翻譯。

該專案還透過給大模型限定國家地區,已實現更精確的 AI 翻譯,如美式英語、英式英語之分;同時提出一些可能能帶來更好效果的最佳化,如對於一些 LLM 未曾訓練到的術語 (或有多種翻譯方式的術語) 建立術語表,進一步提升翻譯的精確度等等。

而這一切都能透過 FastGPT 工作流輕鬆實現,本文將手把手教你如何使用 FastGPT 復刻吳恩達老師的 translation-agent。

單文字塊反思翻譯

咱們先從簡單的開始,即不超出 LLM tokens 數量限制的單文字塊翻譯。

初始翻譯

第一步先讓 LLM 對源文字塊進行初始翻譯:

透過 “文字拼接” 模組引用源語言、目標語言、源文字這三個引數,生成提示詞,傳給 LLM,讓它給出第一版的翻譯。

提示詞:

This is an {{source_lang}} to {{target_lang}} translation, please provide the {{target_lang}} translation for this text. \
Do not provide any explanations or text apart from the translation.
{{source_lang}}: {{source_text}}

{{target_lang}}:

反思

然後讓 LLM 對第一步生成的初始翻譯給出修改建議,稱之為反思

提示詞:

Your task is to carefully read a source text and a translation from {{source_lang}} to {{target_lang}}, and then give constructive criticism and helpful suggestions to improve the translation. \
The final style and tone of the translation should match the style of {{target_lang}} colloquially spoken in {{country}}.

The source text and initial translation, delimited by XML tags <SOURCE_TEXT></SOURCE_TEXT> and <TRANSLATION></TRANSLATION>, are as follows:

<SOURCE_TEXT>
{{source_text}}
</SOURCE_TEXT>

<TRANSLATION>
{{translation_1}}
</TRANSLATION>

When writing suggestions, pay attention to whether there are ways to improve the translation's \n\
(i) accuracy (by correcting errors of addition, mistranslation, omission, or untranslated text),\n\
(ii) fluency (by applying {{target_lang}} grammar, spelling and punctuation rules, and ensuring there are no unnecessary repetitions),\n\
(iii) style (by ensuring the translations reflect the style of the source text and takes into account any cultural context),\n\
(iv) terminology (by ensuring terminology use is consistent and reflects the source text domain; and by only ensuring you use equivalent idioms {{target_lang}}).\n\

Write a list of specific, helpful and constructive suggestions for improving the translation.
Each suggestion should address one specific part of the translation.
Output only the suggestions and nothing else.

這裡的提示詞接收 5 個引數,源文字、初始翻譯、源語言、目標語言以及限定詞地區國家,這樣 LLM 會對前面生成的翻譯提出相當多的修改建議,為後續的提升翻譯作準備。

提升翻譯

提示詞:

Your task is to carefully read, then edit, a translation from {{source_lang}} to {{target_lang}}, taking into
account a list of expert suggestions and constructive criticisms.

The source text, the initial translation, and the expert linguist suggestions are delimited by XML tags <SOURCE_TEXT></SOURCE_TEXT>, <TRANSLATION></TRANSLATION> and <EXPERT_SUGGESTIONS></EXPERT_SUGGESTIONS> \
as follows:

<SOURCE_TEXT>
{{source_lang}}
</SOURCE_TEXT>

<TRANSLATION>
{{translation_1}}
</TRANSLATION>

<EXPERT_SUGGESTIONS>
{{reflection}}
</EXPERT_SUGGESTIONS>

Please take into account the expert suggestions when editing the translation. Edit the translation by ensuring:

(i) accuracy (by correcting errors of addition, mistranslation, omission, or untranslated text),
(ii) fluency (by applying {{target_lang}} grammar, spelling and punctuation rules and ensuring there are no unnecessary repetitions), \
(iii) style (by ensuring the translations reflect the style of the source text)
(iv) terminology (inappropriate for context, inconsistent use), or
(v) other errors.

Output only the new translation and nothing else.

在前文生成了初始翻譯以及相應的反思後,將這二者輸入給第三次 LLM 翻譯,這樣我們就能獲得一個比較高質量的翻譯結果。

執行效果

由於考慮之後對這個反思翻譯的複用,所以建立了一個外掛,那麼在下面我直接呼叫這個外掛就能使用反思翻譯,效果如下:

隨機挑選了一段哈利波特的內容。

可以看到反思翻譯後的效果還是好上不少的,其中反思的輸出如下:

長文反思翻譯

在掌握了對短文字塊的反思翻譯後,我們能輕鬆的透過分片和迴圈,實現對長文字也即多文字塊的反思翻譯。

整體的邏輯是,首先對傳入文字的 tokens 數量做判斷,如果不超過設定的 tokens 限制,那麼直接呼叫單文字塊反思翻譯,如果超過設定的 tokens 限制,那麼切割為合理的大小,再分別進行對應的反思翻譯處理。

至於為什麼要切割分塊,有兩個原因:

1、大模型輸出上下文只有 4k,無法輸出超過 4k token 內容的文字。

2、輸入分塊可以減少太長的輸入導致的幻覺。

計算 tokens

首先,我使用 “Laf 函式” 模組來實現對輸入文字的 tokens 的計算。

Laf 函式的使用相當簡單,即開即用,只需要在 Laf 雲開發平臺中建立個應用,然後安裝 tiktoken 依賴,匯入如下程式碼即可:

const { Tiktoken } = require("tiktoken/lite");
const cl100k_base = require("tiktoken/encoders/cl100k_base.json");

interface IRequestBody {
  str: string
}

interface RequestProps extends IRequestBody {
  systemParams: {
    appId: string,
    variables: string,
    histories: string,
    cTime: string,
    chatId: string,
    responseChatItemId: string
  }
}

interface IResponse {
  message: string;
  tokens: number;
}

export default async function (ctx: FunctionContext): Promise<IResponse> {
  const { str = "" }: RequestProps = ctx.body

  const encoding = new Tiktoken(
    cl100k_base.bpe_ranks,
    cl100k_base.special_tokens,
    cl100k_base.pat_str
  );
  const tokens = encoding.encode(str);
  encoding.free();

  return {
    message: 'ok',
    tokens: tokens.length
  };
}

再回到 FastGPT,點選 “同步引數”,再連線將源文字傳入,即可計算 tokens 數量。

計算單文字塊大小

由於不涉及第三方包,只是一些資料處理,所以直接使用 “程式碼執行” 模組處理即可:

function main({tokenCount, tokenLimit}){
  
  const numChunks = Math.ceil(tokenCount / tokenLimit);
  let chunkSize = Math.floor(tokenCount / numChunks);

  const remainingTokens = tokenCount % tokenLimit;
  if (remainingTokens > 0) {
    chunkSize += Math.floor(remainingTokens / numChunks);
  }

  return {chunkSize};
}

透過上面的程式碼,我們就能算出不超過 token 限制的合理單文字塊大小是多少了。

獲得切分後源文字塊

透過單文字塊大小和源文字,我們在 Laf 中再編寫一個函式呼叫 langchain 的 textsplitters 包來實現文字分片,具體程式碼如下:

import cloud from '@lafjs/cloud'
import { TokenTextSplitter } from "@langchain/textsplitters";

interface IRequestBody {
  text: string
  chunkSize: number
}

interface RequestProps extends IRequestBody {
  systemParams: {
    appId: string,
    variables: string,
    histories: string,
    cTime: string,
    chatId: string,
    responseChatItemId: string
  }
}

interface IResponse {
  output: string[];
}

export default async function (ctx: FunctionContext): Promise<IResponse> {
  const { text = '', chunkSize = 1000 }: RequestProps = ctx.body;

  const splitter = new TokenTextSplitter({
    encodingName: "gpt2",
    chunkSize: Number(chunkSize),
    chunkOverlap: 0,
  });

  const initialChunks = await splitter.splitText(text);
  console.log(initialChunks)

  // 定義不同語言的句子分隔符
  const sentenceDelimiters = /[。!?.!?]/;

  // 進一步處理每個初步分割塊
  const output = [];
  let currentChunk = initialChunks[0];

  for (let i = 1; i < initialChunks.length; i++) {
    const sentences = initialChunks[i].split(sentenceDelimiters);
    if (sentences.length > 0) {
      currentChunk += sentences[0]; // 拼接第一個句子到當前塊
      output.push(currentChunk.trim()); // 將當前塊加入輸出陣列
      currentChunk = sentences.slice(1).join(''); // 剩餘的句子作為新的當前塊
    }
  }

  // 將最後一個塊加入輸出陣列
  if (currentChunk.trim().length > 0) {
    output.push(currentChunk.trim());
  }

  console.log(output);
  return {
    output
  }
}

這樣我們就獲得了切分好的文字,接下去的操作就類似單文字塊反思翻譯。

多文字塊翻譯

這裡應該還是不能直接呼叫前面的單文字塊反思翻譯,因為提示詞中會涉及一些上下文的處理 (或者可以修改下前面寫好的外掛,多傳點引數進去)。

詳細的和前面類似,就是提示詞進行一些替換,以及需要做一些很簡單的資料處理,整體效果如下。

多文字塊初始翻譯

多文字塊反思

多文字塊提升翻譯

迴圈執行

長文反思翻譯比較關鍵的一個部分,就是對多個文字塊進行迴圈反思翻譯。

FastGPT 提供了工作流線路可以返回去執行的功能,所以我們可以寫一個很簡單的判斷函式,來判斷結束或是接著執行。

js 程式碼:

function main({chunks, currentChunk}){
    const findIndex = chunks.findIndex((item) => item ===currentChunk)
    
    return {
        isEnd: chunks.length-1 === findIndex,
        i: findIndex + 1,
    }
}

也就是透過判斷當前處理的這個文字塊,是否是最後一個文字塊,從而判斷是否需要繼續執行,就這樣,我們實現了長文反思翻譯的效果。

執行效果

首先輸入全域性設定:

然後輸入需要翻譯的文字,這裡我選擇了一章哈利波特的英文原文來做翻譯,其文字長度透過 OpenAI 對 tokens 數量的判斷如下:

實際執行效果如下:

可以看到還是能滿足閱讀需求的。

進一步調優

提示詞調優

在源專案中,給 AI 的系統提示詞還是比較的簡略的,我們可以透過比較完善的提示詞,來督促 LLM 返回更合適的翻譯,進一步提升翻譯的質量。比如可以使用 CoT 思維鏈,讓 LLM 顯式地、系統地生成推理鏈條,展示翻譯的完整思考過程。

比如初始翻譯中的提示詞可以換成以下提示詞:

# Role: 資深翻譯專家

## Background:
你是一位經驗豐富的翻譯專家,精通{{source_lang}}和{{target_lang}}互譯,尤其擅長將{{source_lang}}文章譯成流暢易懂的{{target_lang}}。你曾多次帶領團隊完成大型翻譯專案,譯文廣受好評。

## Attention:
- 翻譯過程中要始終堅持"信、達、雅"的原則,但"達"尤為重要
- 譯文要符合{{target_lang}}的表達習慣,通俗易懂,連貫流暢 
- 避免使用過於文縐縐的表達和晦澀難懂的典故引用
- 對於專有的名詞或術語,可以適當保留或音譯

## Constraints:
- 必須嚴格遵循四輪翻譯流程:直譯、意譯、校審、定稿  
- 譯文要忠實原文,準確無誤,不能遺漏或曲解原意
- 注意判斷上下文,避免重複翻譯

## Goals:
- 透過四輪翻譯流程,將{{source_lang}}原文譯成高質量的{{target_lang}}譯文  
- 譯文要準確傳達原文意思,語言表達力求淺顯易懂,朗朗上口
- 適度使用一些熟語俗語、流行網路用語等,增強譯文的親和力
- 在直譯的基礎上,提供至少2個不同風格的意譯版本供選擇

## Skills:
- 精通{{source_lang}} {{target_lang}}兩種語言,具有紮實的語言功底和豐富的翻譯經驗
- 擅長將{{source_lang}}表達習慣轉換為地道自然的{{target_lang}}
- 對當代{{target_lang}}語言的發展變化有敏銳洞察,善於把握語言流行趨勢

## Workflow:
1. 第一輪直譯:逐字逐句忠實原文,不遺漏任何資訊
2. 第二輪意譯:在直譯的基礎上用通俗流暢的{{target_lang}}意譯原文,至少提供2個不同風格的版本
3. 第三輪校審:仔細審視譯文,消除偏差和欠缺,使譯文更加地道易懂 
4. 第四輪定稿:擇優選取,反覆修改潤色,最終定稿出一個簡潔暢達、符合大眾閱讀習慣的譯文

## OutputFormat: 
- 每一輪翻譯前用【思考】說明該輪要點
- 每一輪翻譯後用【翻譯】呈現譯文
- 在\`\`\`程式碼塊中展示最終定稿譯文,\`\`\`之後無需加其他提示

## Suggestions:
- 直譯時力求忠實原文,但不要過於拘泥逐字逐句
- 意譯時在準確表達原意的基礎上,用最樸實無華的{{target_lang}}來表達 
- 校審環節重點關注譯文是否符合{{target_lang}}表達習慣,是否通俗易懂
- 定稿時適度採用一些熟語諺語、網路流行語等,使譯文更接地氣- 善於利用{{target_lang}}的靈活性,用不同的表述方式展現同一內容,提高譯文的可讀性

從而可以返回更準確更高質量的初始翻譯。我們還需要再加一個節點,將初始翻譯的第四輪定稿提取出來:

js 程式碼如下:

function main({data1}){
    const result = data1.split("```").filter(item => !!item.trim())

    if(result[result.length-1]) {
        return {
            result: result[result.length-1]
        }
    }

    return {
        result: '未擷取到翻譯內容'
    }
}

後續的反思和提升翻譯也可以修改更準確的提示詞,例如:

提示詞如下:

# Role: 資深翻譯專家

## Background:
你是一位經驗豐富的翻譯水平評判專家,精通{{source_lang}}和{{target_lang}}互譯,尤其擅長將{{source_lang}}文章譯成流暢易懂的{{target_lang}}。你曾多次參與文章翻譯的校對和稽核,能對翻譯的文章提出一針見血的見解

## Attention:
- 譯文要遵守"信、達、雅"的原則,但"達"尤為重要
- 譯文要符合{{target_lang}}的表達習慣,通俗易懂,連貫流暢 
- 譯文要避免使用過於文縐縐的表達和晦澀難懂的典故引用

## Constraints: 
- 譯文要忠實原文,準確無誤,不能遺漏或曲解原意
- 建議要明確可執行,一針見血
- 儘可能詳細地對每段話提出建議

## Goals:
- 你會獲得一段{{source_lang}}的原文,以及它對應的初始翻譯,你需要針對這段翻譯給出你的改進建議
- 儘可能詳細地對每段話進行判斷,對於需要修改部分的提出建議,而無需修改的部分不要強行修改
- 譯文要準確傳達原文意思,語言表達力求淺顯易懂,朗朗上口
- 適度使用一些熟語俗語、流行網路用語等,增強譯文的親和力

## Skills:
- 精通{{source_lang}} {{target_lang}}兩種語言,具有紮實的語言功底和豐富的翻譯經驗
- 擅長將{{source_lang}}表達習慣轉換為地道自然的{{target_lang}}
- 對當代{{target_lang}}語言的發展變化有敏銳洞察,善於把握語言流行趨勢

我們再來看看最終的執行效果,拿一段技術文章來測試一下:

In February of 1992, the development of Windows 3.1 was nearing a close, and the Windows team was trying to figure out what their next steps would be. By the 5th of March, the team knew that they’d be focusing on desktops, laptops, mobile, and pen with NT taking servers and workstations. The team also knew that they needed to address three major areas: UI, hardware support, networking.

There was a ton of stuff being worked on at this time (and through the rest of the 1990s) within Microsoft. Just within the Systems group (as distinct from the Apps group) Janus would release on the 6th of April as Windows 3.1, Astro would release in March of 1993 as MS-DOS 6.0, Winball would release in October of 1992 as Windows for Workgroups 3.1, Jaguar while being worked on at this time would never see an independent release (more on that in a bit), and then came the next windows projects: Cougar, Panther, Rover, NT, and Cairo. Cougar was a project to build a fully 32 bit Windows kernel, evolving the Windows 3.x 386 mode kernel for 386-class and higher machines. Panther was a project to port the win32 API to this new kernel. Rover was a project to make a mobile computing version of Cougar/Panther. The NT project was Microsoft’s first steps into a dedicated workstation and server release of Windows, and it would release in July of 1993. Cairo was a project for the next major release of NT, and it would mirror many of the changes to Windows from Cougar/Panther (and the reverse is also true). This system comprised of Cougar and Panther was known as Chicago. The Cougar portion of this system was vital to making a more stable and robust Windows. Beyond being a fully 32 bit protected-mode system, this new kernel would feature dynamically loaded and unloaded protected-mode device drivers. This system would also be threaded and fully support any MS-DOS program running from Windows (where previously in Windows 2 and 3, programs that wrote directly to video RAM would require Windows to terminate and stay resident, one side effect being that in really big Command and Conquer maps, the memory space of Windows would be overwritten and as a result Windows would not restore on exit).

These moves were huge for Chicago and for Microsoft more generally. When Chicago was taking shape in 1992, MS-DOS was still Microsoft’s bread and butter. Brad Silverberg was relatively new to Microsoft, but he had a very strong background. He had worked at Apple on the Lisa, and he had worked at Borland. By early 1992, he was the project leader of Chicago and the SVP of Microsoft’s personal systems division. In an internal Microsoft memo Silverberg said:

    Lest anyone be confused, ms-dos is the the bedrock product of the company, accounting for a very major portion of Microsoft’s profits (ie, stock price). Further, it is under strong competitive pressures (I am more inclined to say “under attack”) from DR-DOS and IBM. We must protect this franchise with our lives. Short term, that means continued aggressive marketing plans. In addition, it also means we need to get yearly product releases out so we put the other guys on a treadmill, rather than be put on the treadmill. As a result, we are going to release a new version of MS-DOS this year, chock full of new goodies, while we move with full-speed toward cougar.

That new MS-DOS release was MS-DOS 6 mentioned earlier. The most visible and important new “goodies” referenced by Silverberg were disk defragmentation, disk compression, anti-virus, a new backup system, and file transfer tools. MS-DOS 6 was released in March of 1993 with updates being pushed until June of 1994.

I bring this up to try and portray where Microsoft and the industry were at this time. IBM compatible computers outnumbered all other computers by nearly 80 million units. MS-DOS or a compatible DOS system was installed on almost all of them (with OS/2 or Linux being rare). Most software on these computers ran in 16 bit real mode. Most hardware was configured with dip switches, and the config had to match that setting exactly. Loading a driver required knowledge of autoexec and load-high tools. Windows 3 was a huge success, and Windows 3.1 was an even greater success. Despite these successes and the resultant changes in Microsoft’s future plans, MS-DOS was still the market leader in PC operating systems by a very wide margin. Windows 3x did ameliorate some problems, but the old systems remained dominant. Due to this, Microsoft absolutely needed to ensure that MS-DOS was still part of their future despite having a more technically advanced system in NT. Adding to this, most computers that home users were purchasing were incapable of providing a good experience with NT. Chicago needed to provide the best experience possible for win16, win32, and MS-DOS applications on modest hardware, and it needed to be a noticeable improvement over Windows 3. If Microsoft failed in either case, they would be yielding ground to Digital Research or to IBM.

Ultimately, the need for backwards compatibility meant that some 16 bit code remained in Chicago. Without this, the backwards compatibility wouldn’t have been as good. In hindsight, given that IBM’s OS/2 could run DOS and Windows software, this was a very good decision on the part of Microsoft.

Chicago was structured in a way that is similar to Windows for Workgroups 3.1 (386 enhanced), but is far more refined. There are a large number of virtual device drivers (VxDs) running in 32 bit protected mode alongside virtual DOS machines (VDMs) running in a virtual real mode. These virtual device drivers are used for real physical hardware, for emulating devices for virtual machines, and for providing services to other software. Three of these VxDs comprise the very heart of Chicago: Virtual Machine Manager (VMM32.VXD), Configuration Manager (CONFIGMG), Installable Filesystem Manager (IFM). VMM32 is essentially the Chicago kernel. It handles memory management, event handling, interrupt handling, device driver loading and initialization, the creation of virtual machines, and the scheduling. CONFIGMG handles plug and play. IFM coordinates filesystem access, provides a disk buffer, and provides a 32 bit protected mode I/O access system. This bypasses MS-DOS entirely and was first seen 386 Windows 3 releases.

翻譯效果如下:

太強了!

從現在開始,不管你想翻譯什麼文章,不管這篇文章有多長,你都可以直接丟給這個翻譯專家,然後該幹嘛幹嘛,過一會兒再回來就可以領取最完美的翻譯結果了,還有誰?

其他調優

比如限定詞調優,源專案中已經做了示範,就是加上國家地區這個限定詞,實測確實會有不少提升。

出於 LLM 的卓越能力,我們能夠透過設定不同的 prompt 來獲取不同的翻譯結果,也就是可以很輕鬆地透過設定特殊的限定詞,來實現特定的,更精確的翻譯。

而對於一些超出 LLM 理解的術語等,也可以利用 FastGPT 的知識庫功能進行相應擴充套件,進一步完善翻譯機器人的功能。

結語

下一篇文章將會給大家帶來一個更強大的智慧體:字幕反思翻譯專家

這個專家能幹什麼呢?舉個例子,假設你有一個英文字幕,不管這個字幕有多長,你都可以複製這個字幕的所有內容,直接丟給字幕翻譯專家,然後該幹嘛幹嘛,過一會兒再回來就可以領取最完美的中英雙語字幕了,還有誰?


最後是福利時刻,該翻譯專家的完整工作流我已經分享出來了,大家自取:長文字反思翻譯專家工作流

相關文章