【自然語言處理篇】--聊天機器人從初始到應用

LHBlog發表於2018-07-08

一、前述

維基百科中的機器人是指主要用於協助編者執行大量自動化、高速或機械式、繁瑣的編輯工作的計算機程式或指令碼及其所登入的帳戶。

二、具體

1、最簡單的就是基於Rule-Base的聊天機器人。

也就是計算設計好語料庫的問答語句。 就是小學生級別的 問什麼 答什麼

import random

# 打招呼
greetings = ['hola', 'hello', 'hi', 'Hi', 'hey!','hey']
# 回覆打招呼
random_greeting = random.choice(greetings)

# 對於“你怎麼樣?”這個問題的回覆
question = ['How are you?','How are you doing?']
# “我很好”
responses = ['Okay',"I'm fine"]
# 隨機選一個回
random_response = random.choice(responses)

# 機器人跑起來
while True:
    userInput = input(">>> ")
    if userInput in greetings:
        print(random_greeting)
    elif userInput in question:
        print(random_response)
    # 除非你說“拜拜”
    elif userInput == 'bye':
        break
    else:
        print("I did not understand what you said")

 結果:

>>> hi
hey
>>> how are u
I did not understand what you said
>>> how are you
I did not understand what you said
>>> how are you?
I did not understand what you said
>>> How are you?
I'm fine
>>> bye

2、升級I:

顯然 這樣的rule太弱智了,我們需要更好一點的“精準對答”,比如 透過關鍵詞來判斷這句話的意圖是什麼(intents)。

from nltk import word_tokenize
import random

# 打招呼
greetings = ['hola', 'hello', 'hi', 'Hi', 'hey!','hey']
# 回覆打招呼
random_greeting = random.choice(greetings)

# 對於“假期”的話題關鍵詞
question = ['break','holiday','vacation','weekend']
# 回覆假期話題
responses = ['It was nice! I went to Paris',"Sadly, I just stayed at home"]
# 隨機選一個回
random_response = random.choice(responses)



# 機器人跑起來
while True:
    userInput = input(">>> ")
    # 清理一下輸入,看看都有哪些詞
    cleaned_input = word_tokenize(userInput)
    # 這裡,我們比較一下關鍵詞,確定他屬於哪個問題
    if  not set(cleaned_input).isdisjoint(greetings):
        print(random_greeting)
    elif not set(cleaned_input).isdisjoint(question):
        print(random_response)
    # 除非你說“拜拜”
    elif userInput == 'bye':
        break
    else:
        print("I did not understand what you said")
>>> hi
hey
>>> how was your holiday?
It was nice! I went to Paris
>>> wow, amazing!
I did not understand what you said
>>> bye

大家大概能發現,這依舊是文字層面的“精準對應”。現在主流的研究方向,是做到語義層面的對應。比如,“肚子好餓哦”, “飯點到了”,應該表示的是要吃飯了的意思。在這個層面,就需要用到word vector之類的embedding方法,這部分內容 日後的課上會涉及到。

3、升級II:

光是會BB還是不行,得有知識體系!才能解決使用者的問題。我們可以用各種資料庫,建立起一套體系,然後通過搜尋的方式,來查詢答案。比如,最簡單的就是Python自己的graph資料結構來搭建一個“地圖”。依據這個地圖,我們可以清楚的找尋從一個地方到另一個地方的路徑,然後作為回答,反饋給使用者。

# 建立一個基於目標行業的database
# 比如 這裡我們用python自帶的graph
graph = {'上海': ['蘇州', '常州'],
         '蘇州': ['常州', '鎮江'],
         '常州': ['鎮江'],
         '鎮江': ['常州'],
         '鹽城': ['南通'],
         '南通': ['常州']}

# 明確如何找到從A到B的路徑
def find_path(start, end, path=[]):
    path = path + [start]
    if start == end:
        return path
    if start not in graph:
        return None
    for node in graph[start]:
        if node not in path:
            newpath = find_path(node, end, path)
            if newpath: return newpath
    return None
print(find_path('上海', "鎮江"))
['上海', '蘇州', '常州', '鎮江']

同樣的構建知識圖譜的玩法,也可以使用一些Logic Programming,比如上個世紀學AI的同學都會學的Prolog。或者比如,python版本的prolog:PyKE。他們可以構建一種複雜的邏輯網路,讓你方便提取資訊,而不至於需要你親手code所有的資訊:

son_of(bruce, thomas, norma)
son_of(fred_a, thomas, norma)
son_of(tim, thomas, norma)
daughter_of(vicki, thomas, norma)
daughter_of(jill, thomas, norma)

4、升級III:

任何行業,都分個前端後端。AI也不例外。我們這裡講的演算法,都是後端跑的。那麼, 為了做一個靠譜的前端,很多專案往往也需要一個簡單易用,靠譜的前端。比如,這裡,利用Google的API,寫一個類似鋼鐵俠Tony的語音小祕書Jarvis:我們先來看一個最簡單的說話版本。利用gTTs(Google Text-to-Speech API), 把文字轉化為音訊。

from gtts import gTTS
import os
tts = gTTS(text='您好,我是您的私人助手,我叫小辣椒', lang='zh-tw')
tts.save("hello.mp3")
os.system("mpg321 hello.mp3")

同理,有了文字到語音的功能,我們還可以運用Google API讀出Jarvis的回覆:

(注意:這裡需要你的機器安裝幾個庫 SpeechRecognition, PyAudio 和 PySpeech)

 
import speech_recognition as sr
from time import ctime
import time
import os
from gtts import gTTS
import sys
 
# 講出來AI的話
def speak(audioString):
    print(audioString)
    tts = gTTS(text=audioString, lang='en')
    tts.save("audio.mp3")
    os.system("mpg321 audio.mp3")

# 錄下來你講的話
def recordAudio():
    # 用麥克風記錄下你的話
    r = sr.Recognizer()
    with sr.Microphone() as source:
        audio = r.listen(source)
 
    # 用Google API轉化音訊
    data = ""
    try:
        data = r.recognize_google(audio)
        print("You said: " + data)
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))
 
    return data

# 自帶的對話技能(rules)
def jarvis():
    
    while True:
        
        data = recordAudio()

        if "how are you" in data:
            speak("I am fine")

        if "what time is it" in data:
            speak(ctime())

        if "where is" in data:
            data = data.split(" ")
            location = data[2]
            speak("Hold on Tony, I will show you where " + location + " is.")
            os.system("open -a Safari https://www.google.com/maps/place/" + location + "/&")

        if "bye" in data:
            speak("bye bye")
            break

# 初始化
time.sleep(2)
speak("Hi Tony, what can I do for you?")

# 跑起
jarvis()
Hi Tony, what can I do for you?
You said: how are you
I am fine
You said: what time is it now
Fri Apr  7 18:16:54 2017
You said: where is London
Hold on Tony, I will show you where London is.
You said: ok bye bye
bye bye

不僅僅是語音前端。包括應用場景:微信,slack,Facebook Messager,等等 都可以把我們的ChatBot給integrate進去。

相關文章