利用BLEU進行機器翻譯檢測(Python-NLTK-BLEU評分方法)

振宇要低調發表於2018-08-03

雙語評估替換分數(簡稱BLEU)是一種對生成語句進行評估的指標。完美匹配的得分為1.0,而完全不匹配則得分為0.0。這種評分標準是為了評估自動機器翻譯系統的預測結果而開發的,具備了以下一些優點:

  1. 計算速度快,計算成本低。
  2. 容易理解。
  3. 與具體語言無關。
  4. 已被廣泛採用。

BLEU評分是由Kishore Papineni等人在他們2002年的論文BLEU a Method for Automatic Evaluation of Machine Translation中提出的。BLEU計算的原理是計算待評價譯文和一個或多個參考譯文間的距離。距離是文字間n元相似度的平均,n=1,2,3(更高的值似乎無關緊要)。也就是說,如果待選譯文和參考譯文的2元(連續詞對)或3元相似度較高,那麼該譯文的得分就較高。

我們是翻譯眾包業務,對於我們的應用場景,如何得知譯員是否有參考機器翻譯引擎就成了一個比較重要的問題。我提出的基本思路是:

  1. 在多個翻譯網站上翻譯原文,得到一組機器翻譯評測集,以下的例子中就是一段原文通過百度、有道翻譯之後,組織了一個機器翻譯評測集
  2. 將譯員翻譯出來的譯文,作為待評測資料,計算其與機器翻譯評測集的BLEU值(使用NLTK中提供的BLEU評分方法)
  3. 值越高,表明匹配度越高,則譯員參考機器翻譯或者直接拷貝機器翻譯的可能性就越高,此時需要專案經理介入。

以下是示例:

  1、原文

新譯星將代表四達時代集團在展覽會上閃亮登場,屆時我們將從新譯星所開展的業務、具備的優勢、成功案例等多個維度進行介紹,讓您更加全面的瞭解新譯星。我們擁有穩定的全職國際化團隊,能夠確保守時、高效的完成翻譯和配音,並通過至臻完善的質量控制和專案管理體系進行全方位把控,提供翻譯、配音、字幕製作、後期製作、播出以及收視率調查等一條龍服務。

  2、人工翻譯

New Transtar will present itself at the Exhibition on behalf of StarTimes, and we will give a comprehensive introduction of ourselves, including the current services we offer, the advantages we hold, and the projects we have completed, to help you understand us more. New Transtar boasts of an international team of professionals and is capable of providing fast and quality-guaranteed services including translating, dubbing, subtitle making, post-production, broadcasting and collecting of viewership ratings, thanks to our strict, streamlined and developed quality control and project management system.

  3、百度翻譯

The new translator will stand on the exhibition on behalf of the four times group at the exhibition. We will introduce the new star's business, the advantages and the successful cases, so that you can understand the new translator more comprehensively. We have a stable full-time international team that ensures punctual, efficient translation and dubbing, and provides a full range of control through the perfect quality control and project management system, providing a one-stop service for translation, dubbing, subtitle production, post production, broadcasting, and ratings surveys.

  4、有道翻譯

The new translator star will represent sida times group in the exhibition, when we will introduce the new translator star's business, advantages, successful cases and other dimensions, so that you can have a more comprehensive understanding of the new translator star. We have a stable full-time international team, which can ensure timely and efficient translation and dubbing. Through perfect quality control and project management system, we provide translation, dubbing, subtitle production, post-production, broadcasting and rating survey.

  5、用百度翻譯和有道翻譯組織機器翻譯評測集

[['The', 'new', 'translator', 'will', 'stand', 'on', 'the', 'exhibition', 'on', 'behalf', 'of', 'the', 'four', 'times', 'group', 'at', 'the', 'exhibition', 'We', 'will', 'introduce', 'the', 'new', 'star`s', 'business', 'the', 'advantages', 'and', 'the', 'successful', 'cases', 'so', 'that', 'you', 'can', 'understand', 'the', 'new', 'translator', 'more', 'comprehensively', 'We', 'have', 'a', 'stable', 'full-time', 'international', 'team', 'that', 'ensures', 'punctual', 'efficient', 'translation', 'and', 'dubbing', 'and', 'provides', 'a', 'full', 'range', 'of', 'control', 'through', 'the', 'perfect', 'quality', 'control', 'and', 'project', 'management', 'system', 'providing', 'a', 'one-stop', 'service', 'for', 'translation', 'dubbing', 'subtitle', 'production', 'post', 'production', 'broadcasting', 'and', 'ratings', 'surveys'],['The', 'new', 'translator', 'star', 'will', 'represent', 'sida', 'times', 'group', 'in', 'the', 'exhibition', 'when', 'we', 'will', 'introduce', 'the', 'new', 'translator', 'star`s', 'business', 'advantages', 'successful', 'cases', 'and', 'other', 'dimensions', 'so', 'that', 'you', 'can', 'have', 'a', 'more', 'comprehensive', 'understanding', 'of', 'the', 'new', 'translator', 'star', 'We', 'have', 'a', 'stable', 'full-time', 'international', 'team', 'which', 'can', 'ensure', 'timely', 'and', 'efficient', 'translation', 'and', 'dubbing', 'Through', 'perfect', 'quality', 'control', 'and', 'project', 'management', 'system', 'we', 'provide', 'translation', 'dubbing', 'subtitle', 'production', 'post-production', 'broadcasting', 'and', 'rating', 'survey']]

  6、用人工翻譯組織待檢測資料

['New', 'Transtar', 'will', 'present', 'itself', 'at', 'the', 'Exhibition', 'on', 'behalf', 'of', 'StarTimes', 'and', 'we', 'will', 'give', 'a', 'comprehensive', 'introduction', 'of', 'ourselves', 'including', 'the', 'current', 'services', 'we', 'offer', 'the', 'advantages', 'we', 'hold', 'and', 'the', 'projects', 'we', 'have', 'completed', 'to', 'help', 'you', 'understand', 'us', 'more', 'New', 'Transtar', 'boasts', 'of', 'an', 'international', 'team', 'of', 'professionals', 'and', 'is', 'capable', 'of', 'providing', 'fast', 'and', 'quality-guaranteed', 'services', 'including', 'translating', 'dubbing', 'subtitle', 'making', 'post-production', 'broadcasting', 'and', 'collecting', 'of', 'viewership', 'ratings', 'thanks', 'to', 'our', 'strict', 'streamlined', 'and', 'developed', 'quality', 'control', 'and', 'project', 'management', 'system']

  7、首先測試人工翻譯產出的譯文與機器翻譯評測集之間的BLEU值,得到結果為0.119115465241,如下

[root@host-10-0-251-156 ~]# python
Python 2.7.5 (default, Apr 11 2018, 07:36:10)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.translate.bleu_score import sentence_bleu
>>>
>>> reference=[['The', 'new', 'translator', 'will', 'stand', 'on', 'the', 'exhibition', 'on', 'behalf', 'of', 'the', 'four', 'times', 'group', 'at', 'the', 'exhibition', 'We', 'will', 'introduce', 'the', 'new', 'star`s', 'business', 'the', 'advantages', 'and', 'the', 'successful', 'cases', 'so', 'that', 'you', 'can', 'understand', 'the', 'new', 'translator', 'more', 'comprehensively', 'We', 'have', 'a', 'stable', 'full-time', 'international', 'team', 'that', 'ensures', 'punctual', 'efficient', 'translation', 'and', 'dubbing', 'and', 'provides', 'a', 'full', 'range', 'of', 'control', 'through', 'the', 'perfect', 'quality', 'control', 'and', 'project', 'management', 'system', 'providing', 'a', 'one-stop', 'service', 'for', 'translation', 'dubbing', 'subtitle', 'production', 'post', 'production', 'broadcasting', 'and', 'ratings', 'surveys'],['The', 'new', 'translator', 'star', 'will', 'represent', 'sida', 'times', 'group', 'in', 'the', 'exhibition', 'when', 'we', 'will', 'introduce', 'the', 'new', 'translator', 'star`s', 'business', 'advantages', 'successful', 'cases', 'and', 'other', 'dimensions', 'so', 'that', 'you', 'can', 'have', 'a', 'more', 'comprehensive', 'understanding', 'of', 'the', 'new', 'translator', 'star', 'We', 'have', 'a', 'stable', 'full-time', 'international', 'team', 'which', 'can', 'ensure', 'timely', 'and', 'efficient', 'translation', 'and', 'dubbing', 'Through', 'perfect', 'quality', 'control', 'and', 'project', 'management', 'system', 'we', 'provide', 'translation', 'dubbing', 'subtitle', 'production', 'post-production', 'broadcasting', 'and', 'rating', 'survey']]
>>>
>>> candidate=['New', 'Transtar', 'will', 'present', 'itself', 'at', 'the', 'Exhibition', 'on', 'behalf', 'of', 'StarTimes', 'and', 'we', 'will', 'give', 'a', 'comprehensive', 'introduction', 'of', 'ourselves', 'including', 'the', 'current', 'services', 'we', 'offer', 'the', 'advantages', 'we', 'hold', 'and', 'the', 'projects', 'we', 'have', 'completed', 'to', 'help', 'you', 'understand', 'us', 'more', 'New', 'Transtar', 'boasts', 'of', 'an', 'international', 'team', 'of', 'professionals', 'and', 'is', 'capable', 'of', 'providing', 'fast', 'and', 'quality-guaranteed', 'services', 'including', 'translating', 'dubbing', 'subtitle', 'making', 'post-production', 'broadcasting', 'and', 'collecting', 'of', 'viewership', 'ratings', 'thanks', 'to', 'our', 'strict', 'streamlined', 'and', 'developed', 'quality', 'control', 'and', 'project', 'management', 'system']
>>>
>>> score = sentence_bleu(reference, candidate)
>>> print score
0.119115465241
>>>

  8、其次我們稍微改動以下百度翻譯出來的譯文,並測試其與機器翻譯評測集之間的BLEU值,得到結果0.875629670466,如下:

    8.1稍微改動之後的百度翻譯

New Transtar will stand on the exhibition on behalf of the four times group at the exhibition. We will introduce the new star's business, the advantages and the successful cases, so that you can understand the new translator more comprehensively. We have a stable full-time international team that ensures punctual, efficient translation and dubbing, and provides a full range of control through the perfect quality control and project management system, providing a one-stop service for translation, dubbing, subtitle production, streamlined and developed quality control and project management system.

    8.2用改動之後的百度翻譯作為待評測資料

['New', 'Transtar', 'will', 'stand', 'on', 'the', 'exhibition', 'on', 'behalf', 'of', 'the', 'four', 'times', 'group', 'at', 'the', 'exhibition', 'We', 'will', 'introduce', 'the', 'new', 'star`s', 'business', 'the', 'advantages', 'and', 'the', 'successful', 'cases', 'so', 'that', 'you', 'can', 'understand', 'the', 'new', 'translator', 'more', 'comprehensively', 'We', 'have', 'a', 'stable', 'full-time', 'international', 'team', 'that', 'ensures', 'punctual', 'efficient', 'translation', 'and', 'dubbing', 'and', 'provides', 'a', 'full', 'range', 'of', 'control', 'through', 'the', 'perfect', 'quality', 'control', 'and', 'project', 'management', 'system', 'providing', 'a', 'one-stop', 'service', 'for', 'translation', 'dubbing', 'subtitle', 'production', 'streamlined', 'and', 'developed', 'quality', 'control', 'and', 'project', 'management', 'system']

    8.3BLEU計算

>>> candidate_baidu=['New', 'Transtar', 'will', 'stand', 'on', 'the', 'exhibition', 'on', 'behalf', 'of', 'the', 'four', 'times', 'group', 'at', 'the', 'exhibition', 'We', 'will', 'introduce', 'the', 'new', 'star`s', 'business', 'the', 'advantages', 'and', 'the', 'successful', 'cases', 'so', 'that', 'you', 'can', 'understand', 'the', 'new', 'translator', 'more', 'comprehensively', 'We', 'have', 'a', 'stable', 'full-time', 'international', 'team', 'that', 'ensures', 'punctual', 'efficient', 'translation', 'and', 'dubbing', 'and', 'provides', 'a', 'full', 'range', 'of', 'control', 'through', 'the', 'perfect', 'quality', 'control', 'and', 'project', 'management', 'system', 'providing', 'a', 'one-stop', 'service', 'for', 'translation', 'dubbing', 'subtitle', 'production', 'streamlined', 'and', 'developed', 'quality', 'control', 'and', 'project', 'management', 'system']
>>> score_baidu = sentence_bleu(reference, candidate_baidu)
>>> print score_baidu
0.875629670466
>>>

  9、由上面示例可看到,當待評測譯文非常接近(也就是說該譯員參考了機器翻譯或直接進行的拷貝)機器翻譯評測集中的資料時,BLEU值會升高。不過至於高到什麼程度才需要專案經理介入,這就需要在實際專案中不斷的摸索了。

 

相關文章