python語音智慧對話聊天機器人，linux&&樹莓派雙平臺相容

路易十四發表於2016-08-19

專案簡介：運用百度語音進行聲音轉中文的識別與合成，智慧對話使用圖靈機器人，錄音則，ｌｉｎｕｘ端用pythonaudio 模組．樹莓派端因為pythonaudio不相容問題，因此用arecord進行錄音．最終程式碼約１５０行．程式碼釋出在ｇｉｔｈｕｂ上．https://github.com/luyishisi/python_yuyinduihua

０．目錄：

１：環境搭建
２：百度語音合成與識別
３：圖靈機器人
４：linux下使用pythonaudio進行音訊解析
５：樹莓派下使用arecord進行錄音
６：ｌｉｎｕｘ整體除錯
７：主要ｂｕｇ解析
８：原始碼樹莓派下的

１．環境搭建

這點非常關鍵，在後期多數問題都是出現在環境不相容上．

１．１：linux　版本

# -*- coding: utf-8 -*-
from pyaudio import PyAudio, paInt16
import numpy as np
from datetime import datetime
import wave
import time
import urllib, urllib2, pycurl
import base64
import json
import os
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

# -*- coding: utf-8 -*-

from pyaudio import PyAudio, paInt16

import numpy as np

from datetime import datetime

import wave

import time

import urllib, urllib2, pycurl

import base64

import json

import os

import sys

reload(sys)

sys.setdefaultencoding( "utf-8" )

這部分環境最好搭建，只需要

apt-get install python-wave* 這類的安裝命令就可以輕鬆搞定．本質上安裝模組就是在找安裝的命令．我一半就是把模組肯定會有的名詞後面接上＊用於模糊匹配．

如果有模組不懂得裝，還是百度一下，難度不大．還有ｍｐｇ１２３用來播發

１．２：樹莓派版本

如果你出現這個博文下出現的錯誤，請果斷棄坑．換用命令列錄音實現，不要折騰pyaudio了．

##先更新軟體包
sudo apt-get update
sudo apt-get upgrade
##安裝必要的程式
sudo apt-get -y install alsa-utils alsa-tools alsa-tools-gui alsamixergui

##先更新軟體包

sudo apt-get update

sudo apt-get upgrade

##安裝必要的程式

sudo apt-get -y install alsa-utils alsa-tools alsa-tools-gui alsamixergui

主要使用的工具

想通過終端來調整揚聲器的音量，只需要輸入alsamixer．這個很重要你使用的錄音裝置的錄音音量需要這裡設定，而且你可以明顯的看到自己的音效卡是否有問題．

使用的錄音裝置我用的是　https://item.taobao.com/item.htm?spm=a1z10.5-c.w4002-3667091491.40.mktumv&id=41424706506

錄音的命令使用的是arecord

arecord,aplay是命令列的ALSA音效卡驅動的錄音和播放工具.
arecord是命令列ALSA音效卡驅動的錄音程式.支援多種檔案格式和多個音效卡.
aplay是命令列播放工具,支援多種檔案格式.

命令格式:這部分需要研讀一下．主要使用ｄｆｒ三個引數

       arecord [flags] [filename]
       aplay [flags] [filename [filename]] ...
選項:
       -h, --help幫助.
       --version列印版本資訊.
       -l, --list-devices列出全部音效卡和數字音訊裝置.
       -L, --list-pcms列出全部PCM定義.
       -D, --device=NAME指定PCM裝置名稱.
       -q --quiet安靜模式.
       -t, --file-type TYPE檔案型別(voc,wav,raw或au).
       -c, --channels=#設定通道號.
       -f --format=FORMAT設定格式.格式包括:S8  U8  S16_LE  S16_BE  U16_LE
              U16_BE  S24_LE S24_BE U24_LE U24_BE S32_LE S32_BE U32_LE U32_BE
              FLOAT_LE  FLOAT_BE  FLOAT64_LE  FLOAT64_BE   IEC958_SUBFRAME_LE
              IEC958_SUBFRAME_BE MU_LAW A_LAW IMA_ADPCM MPEG GSM
       -r, --rate=#&lt;Hz&gt;設定頻率.
       -d, --duration=#設定持續時間,單位為秒.
       -s, --sleep-min=#設定最小休眠時間.
       -M, --mmap　mmap流.
       -N, --nonblock設定為非塊模式.
       -B, --buffer-time=#緩衝持續時長.單位為微妙.
       -v, --verbose顯示PCM結構和設定.
       -I, --separate-channels設定為每個通道一個單獨檔案.

arecord [flags] [filename]

aplay [flags] [filename [filename]] ...

選項:

-h, --help幫助.

--version列印版本資訊.

-l, --list-devices列出全部音效卡和數字音訊裝置.

-L, --list-pcms列出全部PCM定義.

-D, --device=NAME指定PCM裝置名稱.

-q --quiet安靜模式.

-t, --file-type TYPE檔案型別(voc,wav,raw或au).

-c, --channels=#設定通道號.

-f --format=FORMAT設定格式.格式包括:S8 U8 S16_LE S16_BE U16_LE

U16_BE S24_LE S24_BE U24_LE U24_BE S32_LE S32_BE U32_LE U32_BE

FLOAT_LE FLOAT_BE FLOAT64_LE FLOAT64_BE IEC958_SUBFRAME_LE

IEC958_SUBFRAME_BE MU_LAW A_LAW IMA_ADPCM MPEG GSM

-r, --rate=#<Hz>設定頻率.

-d, --duration=#設定持續時間,單位為秒.

-s, --sleep-min=#設定最小休眠時間.

-M, --mmap　mmap流.

-N, --nonblock設定為非塊模式.

-B, --buffer-time=#緩衝持續時長.單位為微妙.

-v, --verbose顯示PCM結構和設定.

-I, --separate-channels設定為每個通道一個單獨檔案.

示例:

       aplay -c 1 -t raw -r 22050 -f mu_law foobar
	播放raw檔案foobar.以22050Hz,單聲道,8位,mu_law格式.

       arecord -d 10 -f cd -t wav -D copy foobar.wav
	以CD質量錄製foobar.wav檔案10秒鐘.使用PCM的"copy".

aplay -c 1 -t raw -r 22050 -f mu_law foobar

播放raw檔案foobar.以22050Hz,單聲道,8位,mu_law格式.

arecord -d 10 -f cd -t wav -D copy foobar.wav

以CD質量錄製foobar.wav檔案10秒鐘.使用PCM的"copy".

２：百度語音合成與識別

這部分難度不大，測試程式碼如下．

#語音合成
#encoding=utf-8
import wave
import urllib, urllib2, pycurl
import base64
import json
## get access token by api key & secret key
## 獲得token，需要填寫你的apikey以及secretkey
def get_token():
    apiKey = "Ll0c53MSac6GBOtpg22ZSGAU"
    secretKey = "44c8af396038a24e34936227d4a19dc2"

    auth_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=" + apiKey + "&client_secret=" + secretKey;

    res = urllib2.urlopen(auth_url)
    json_data = res.read()
    return json.loads(json_data)['access_token']

def dump_res(buf):
    print (buf)

## post audio to server
def use_cloud(token):
    fp = wave.open('2.wav', 'rb')
    ##已經錄好音的語音片段
    nf = fp.getnframes()
    f_len = nf * 2
    audio_data = fp.readframes(nf)

    cuid = "7519663" #你的產品id
    srv_url = 'http://vop.baidu.com/server_api' + '?cuid=' + cuid + '&token=' + token
    http_header = [
        'Content-Type: audio/pcm; rate=8000',
        'Content-Length: %d' % f_len
    ]

    c = pycurl.Curl()
    c.setopt(pycurl.URL, str(srv_url)) #curl doesn't support unicode
    #c.setopt(c.RETURNTRANSFER, 1)
    c.setopt(c.HTTPHEADER, http_header)   #must be list, not dict
    c.setopt(c.POST, 1)
    c.setopt(c.CONNECTTIMEOUT, 30)
    c.setopt(c.TIMEOUT, 30)
    c.setopt(c.WRITEFUNCTION, dump_res)
    c.setopt(c.POSTFIELDS, audio_data)
    c.setopt(c.POSTFIELDSIZE, f_len)
    c.perform() #pycurl.perform() has no return val

if __name__ == "__main__":
    token = get_token()
    #獲得token
    use_cloud(token)
    #進行處理，輸出在函式內部

#語音合成

#encoding=utf-8

import wave

import urllib, urllib2, pycurl

import base64

import json

## get access token by api key & secret key

## 獲得token，需要填寫你的apikey以及secretkey

def get_token():

apiKey = "Ll0c53MSac6GBOtpg22ZSGAU"

secretKey = "44c8af396038a24e34936227d4a19dc2"

auth_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=" + apiKey + "&client_secret=" + secretKey;

res = urllib2.urlopen(auth_url)

json_data = res.read()

return json.loads(json_data)['access_token']

def dump_res(buf):

print (buf)

## post audio to server

def use_cloud(token):

fp = wave.open('2.wav', 'rb')

##已經錄好音的語音片段

nf = fp.getnframes()

f_len = nf * 2

audio_data = fp.readframes(nf)

cuid = "7519663" #你的產品id

srv_url = 'http://vop.baidu.com/server_api' + '?cuid=' + cuid + '&token=' + token

http_header = [

'Content-Type: audio/pcm; rate=8000',

'Content-Length: %d' % f_len

]

c = pycurl.Curl()

c.setopt(pycurl.URL, str(srv_url)) #curl doesn't support unicode

#c.setopt(c.RETURNTRANSFER, 1)

c.setopt(c.HTTPHEADER, http_header) #must be list, not dict

c.setopt(c.POST, 1)

c.setopt(c.CONNECTTIMEOUT, 30)

c.setopt(c.TIMEOUT, 30)

c.setopt(c.WRITEFUNCTION, dump_res)

c.setopt(c.POSTFIELDS, audio_data)

c.setopt(c.POSTFIELDSIZE, f_len)

c.perform() #pycurl.perform() has no return val

if __name__ == "__main__":

token = get_token()

#獲得token

use_cloud(token)

#進行處理，輸出在函式內部

３：圖靈機器人

官方網址：http://www.tuling123.com/

圖靈機器人部分的測試程式碼

難度不大非常輕鬆．你得去註冊一下，然後使用他們給你的ｋｅｙ和ａｐｉ．剩下的就是ｊｓｏｎ的文字提取

# -*- coding: utf-8 -*-
import urllib
import json

def getHtml(url):
    page = urllib.urlopen(url)
    html = page.read()
    return html

if __name__ == '__main__':

    key = '05ba411481c8cfa61b91124ef7389767'
    api = 'http://www.tuling123.com/openapi/api?key=' + key + '&info='
    while True:
        info = raw_input('我: ')
        request = api + info
        response = getHtml(request)
        dic_json = json.loads(response)
        print '機器人: '.decode('utf-8') + dic_json['text']

# -*- coding: utf-8 -*-

import urllib

import json

def getHtml(url):

page = urllib.urlopen(url)

html = page.read()

return html

if __name__ == '__main__':

key = '05ba411481c8cfa61b91124ef7389767'

api = 'http://www.tuling123.com/openapi/api?key=' + key + '&info='

while True:

info = raw_input('我: ')

request = api + info

response = getHtml(request)

dic_json = json.loads(response)

print '機器人: '.decode('utf-8') + dic_json['text']

４：linux下使用pythonaudio進行音訊解析

這部分，在正常電腦上，只要環境沒有大問題就很輕鬆，程式碼放在整體的原始碼中，這裡做個小說明．

這部分程式碼不可執行，在整體原始碼中可以．不過這部分稍微需要抽取出來，作為理解

建立的ｐａ是ｐｙｕｄｉｏ物件，可以獲取當前的音高，然後檢測當音高超過２００就啟動，錄音．同時有一個５秒的額外限制．

NUM_SAMPLES = 2000      # pyAudio內部快取的塊的大小
SAMPLING_RATE = 8000    # 取樣頻率
LEVEL = 1500            # 聲音儲存的閾值
COUNT_NUM = 20          # NUM_SAMPLES個取樣之內出現COUNT_NUM個大於LEVEL的取樣則記錄聲音
SAVE_LENGTH = 8         # 聲音記錄的最小長度：SAVE_LENGTH * NUM_SAMPLES 個取樣
# 開啟聲音輸入
pa = PyAudio()
stream = pa.open(format=paInt16, channels=1, rate=SAMPLING_RATE, input=True,
                frames_per_buffer=NUM_SAMPLES)＼
string_audio_data = stream.read(NUM_SAMPLES)
    # 將讀入的資料轉換為陣列
    audio_data = np.fromstring(string_audio_data, dtype=np.short)
    # 計算大於LEVEL的取樣的個數
    large_sample_count = np.sum( audio_data > LEVEL )

    temp = np.max(audio_data)
    if temp > 2000 and t == 0:
        t = 1#開啟錄音
        print "檢測到訊號，開始錄音,計時五秒"
        begin = time.time()
        print temp

NUM_SAMPLES = 2000 # pyAudio內部快取的塊的大小

SAMPLING_RATE = 8000 # 取樣頻率

LEVEL = 1500 # 聲音儲存的閾值

COUNT_NUM = 20 # NUM_SAMPLES個取樣之內出現COUNT_NUM個大於LEVEL的取樣則記錄聲音

SAVE_LENGTH = 8 # 聲音記錄的最小長度：SAVE_LENGTH * NUM_SAMPLES 個取樣

# 開啟聲音輸入

pa = PyAudio()

stream = pa.open(format=paInt16, channels=1, rate=SAMPLING_RATE, input=True,

frames_per_buffer=NUM_SAMPLES)＼

string_audio_data = stream.read(NUM_SAMPLES)

# 將讀入的資料轉換為陣列

audio_data = np.fromstring(string_audio_data, dtype=np.short)

# 計算大於LEVEL的取樣的個數

large_sample_count = np.sum( audio_data > LEVEL )

temp = np.max(audio_data)

if temp > 2000 and t == 0:

t = 1#開啟錄音

print "檢測到訊號，開始錄音,計時五秒"

begin = time.time()

print temp

５：樹莓派下使用arecord進行錄音

這裡主要還是記錄下整體的一些資料．在樹莓派上能夠成功執行下面的命令就算ｏｋ．別的是一路研究的資料．

sudo arecord -D “plughw:1,0” -d 5 f1.wav

引數釋義： -D這個引數的意思就選擇裝置，外部裝置就是plughw:1,0 內部裝置就是plughw:0,0，樹莓派本身並沒有錄音模組，故沒有內部裝置。-d 5

的意思就是錄製時間為5秒，如果不加這個引數就是一直錄音直到ctrol+C停止，最後生成的檔名字叫做f1.wav

百度語音要求的是１６位元的所以還需要設定－ｆ

具體ｐｃｍ的說明如下：

這都是PCM的一種表示範圍的方法，所以表示方法中最小值等價，最大值等價，中間的資料級別就是對應的進度了，可以都對映到-1~1範圍。

S8: signed 8 bits，有符號字元 = char，表示範圍 -128~127
U8: unsigned 8 bits，無符號字元 = unsigned char，表示範圍 0~255
S16_LE: little endian signed 16 bits，小端有符號字 = short，表示範圍 -32768~32767
S16_BE: big endian signed 16 bits，大端有符號字 = short倒序(PPC)，表示範圍 -32768~32767
U16_LE: little endian unsigned 16 bits，小端無符號字 = unsigned short，表示範圍 0~65535
U16_BE: big endian unsigned signed 16 bits，大端無符號字 = unsigned short倒序(PPC)，表示範圍 0~65535
還有S24_LE,S32_LE等，都可以表示數字的方法，PCM都可以用這些表示。
上面這些值中，所有最小值-128, 0, -32768, -32768, 0, 0對應PCM描敘來說都是一個值，表示最小值，可以量化到浮點-1。所有最大值也是一個值，可以量化到浮點1，其他值可以等比例轉換。

PCMU應該是指無符號PCM：可以包括U8,U16_LE,U16_BE,…
PCMA應該是指有符號PCM：可以包括S8,S16_LE,S16_BE,…

檢視音效卡

cat/proc/asound/cards 

cat/proc/asound/modules

cat/proc/asound/cards

cat/proc/asound/modules

６：整體除錯ｌｉｎｕｘ平臺下的

原始碼如下：解析在註釋上

# -*- coding: utf-8 -*-
from pyaudio import PyAudio, paInt16
import numpy as np
from datetime import datetime
import wave
import time
import urllib, urllib2, pycurl
import base64
import json
import os
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )
#一些全域性變數
save_count = 0
save_buffer = []
t = 0
sum = 0
time_flag = 0
flag_num = 0
filename = ''
duihua = '1'

def getHtml(url):
    page = urllib.urlopen(url)
    html = page.read()
    return html

def get_token():
    apiKey = "Ll0c53MSac6GBOtpg22ZSGAU"
    secretKey = "44c8af396038a24e34936227d4a19dc2"
    auth_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=" + apiKey + "&client_secret=" + secretKey;
    res = urllib2.urlopen(auth_url)
    json_data = res.read()
    return json.loads(json_data)['access_token']

def dump_res(buf):#輸出百度語音識別的結果
    global duihua
    print "字串型別"
    print (buf)
    a = eval(buf)
    print type(a)
    if a['err_msg']=='success.':
        #print a['result'][0]#終於搞定了，在這裡可以輸出，返回的語句
        duihua = a['result'][0]
        print duihua

def use_cloud(token):#進行合成
    fp = wave.open(filename, 'rb')
    nf = fp.getnframes()
    f_len = nf * 2
    audio_data = fp.readframes(nf)
    cuid = "7519663" #產品id
    srv_url = 'http://vop.baidu.com/server_api' + '?cuid=' + cuid + '&token=' + token
    http_header = [
        'Content-Type: audio/pcm; rate=8000',
        'Content-Length: %d' % f_len
    ]

    c = pycurl.Curl()
    c.setopt(pycurl.URL, str(srv_url)) #curl doesn't support unicode
    #c.setopt(c.RETURNTRANSFER, 1)
    c.setopt(c.HTTPHEADER, http_header)   #must be list, not dict
    c.setopt(c.POST, 1)
    c.setopt(c.CONNECTTIMEOUT, 30)
    c.setopt(c.TIMEOUT, 30)
    c.setopt(c.WRITEFUNCTION, dump_res)
    c.setopt(c.POSTFIELDS, audio_data)
    c.setopt(c.POSTFIELDSIZE, f_len)
    c.perform() #pycurl.perform() has no return val

# 將data中的資料儲存到名為filename的WAV檔案中
def save_wave_file(filename, data):
    wf = wave.open(filename, 'wb')
    wf.setnchannels(1)
    wf.setsampwidth(2)
    wf.setframerate(SAMPLING_RATE)
    wf.writeframes("".join(data))
    wf.close()


NUM_SAMPLES = 2000      # pyAudio內部快取的塊的大小
SAMPLING_RATE = 8000    # 取樣頻率
LEVEL = 1500            # 聲音儲存的閾值
COUNT_NUM = 20          # NUM_SAMPLES個取樣之內出現COUNT_NUM個大於LEVEL的取樣則記錄聲音
SAVE_LENGTH = 8         # 聲音記錄的最小長度：SAVE_LENGTH * NUM_SAMPLES 個取樣

# 開啟聲音輸入ｐｙａｕｄｉｏ物件
pa = PyAudio()
stream = pa.open(format=paInt16, channels=1, rate=SAMPLING_RATE, input=True,
                frames_per_buffer=NUM_SAMPLES)


token = get_token()#獲取ｔｏｋｅｎ
key = '05ba411481c8cfa61b91124ef7389767'　#ｋｅｙ和ａｐｉ的設定
api = 'http://www.tuling123.com/openapi/api?key=' + key + '&info='

while True:
    # 讀入NUM_SAMPLES個取樣
    string_audio_data = stream.read(NUM_SAMPLES)
    # 將讀入的資料轉換為陣列
    audio_data = np.fromstring(string_audio_data, dtype=np.short)
    # 計算大於LEVEL的取樣的個數
    large_sample_count = np.sum( audio_data > LEVEL )

    temp = np.max(audio_data)
    if temp > 2000 and t == 0:
        t = 1#開啟錄音
        print "檢測到訊號，開始錄音,計時五秒"
        begin = time.time()
        print temp
    if t:
        print np.max(audio_data)
        if np.max(audio_data)<1000:
            sum += 1
            print sum
        end = time.time()
        if end-begin>5:
            time_flag = 1
            print "五秒到了，準備結束"
        # 如果個數大於COUNT_NUM，則至少儲存SAVE_LENGTH個塊
        if large_sample_count > COUNT_NUM:
            save_count = SAVE_LENGTH
        else:
            save_count -= 1

        if save_count < 0:
            save_count = 0

        if save_count > 0:
            # 將要儲存的資料存放到save_buffer中
            save_buffer.append(string_audio_data )
        else:
            # 將save_buffer中的資料寫入WAV檔案，WAV檔案的檔名是儲存的時刻
            #if  time_flag:
            if len(save_buffer) > 0  or time_flag:
                #filename = datetime.now().strftime("%Y-%m-%d_%H_%M_%S") + ".wav"#原本是用時間做名字
                filename = str(flag_num)+".wav"
                flag_num += 1

                save_wave_file(filename, save_buffer)
                save_buffer = []
                t = 0
                sum =0
                time_flag = 0
                print filename, "儲存成功正在進行語音識別"
                use_cloud(token)
                print duihua
                info = duihua
                duihua = ""
                request = api + info
                response = getHtml(request)
                dic_json = json.loads(response)

                #print '機器人: '.decode('utf-8') + dic_json['text']#這裡麻煩的是字元編碼
                #huida = ' '.decode('utf-8') + dic_json['text']
                a = dic_json['text']
                print type(a)
                unicodestring = a

                # 將Unicode轉化為普通Python字串："encode"
                utf8string = unicodestring.encode("utf-8")

                print type(utf8string)
                print str(a)
                url = "http://tsn.baidu.com/text2audio?tex="+dic_json['text']+"&lan=zh&per=0&pit=1&spd=7&cuid=7519663&ctp=1&tok=24.a5f341cf81c523356c2307b35603eee6.2592000.1464423912.282335-7519663"
                os.system('mpg123 "%s"'%(url))#用ｍｐｇ１２３來播放

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

# -*- coding: utf-8 -*-

from pyaudio import PyAudio, paInt16

import numpy as np

from datetime import datetime

import wave

import time

import urllib, urllib2, pycurl

import base64

import json

import os

import sys

reload(sys)

sys.setdefaultencoding( "utf-8" )

#一些全域性變數

save_count = 0

save_buffer = []

t = 0

sum = 0

time_flag = 0

flag_num = 0

filename = ''

duihua = '1'

def getHtml(url):

page = urllib.urlopen(url)

html = page.read()

return html

def get_token():

apiKey = "Ll0c53MSac6GBOtpg22ZSGAU"

secretKey = "44c8af396038a24e34936227d4a19dc2"

auth_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=" + apiKey + "&client_secret=" + secretKey;

res = urllib2.urlopen(auth_url)

json_data = res.read()

return json.loads(json_data)['access_token']

def dump_res(buf):#輸出百度語音識別的結果

global duihua

print "字串型別"

print (buf)

a = eval(buf)

print type(a)

if a['err_msg']=='success.':

#print a['result'][0]#終於搞定了，在這裡可以輸出，返回的語句

duihua = a['result'][0]

print duihua

def use_cloud(token):#進行合成

fp = wave.open(filename, 'rb')

nf = fp.getnframes()

f_len = nf * 2

audio_data = fp.readframes(nf)

cuid = "7519663" #產品id

srv_url = 'http://vop.baidu.com/server_api' + '?cuid=' + cuid + '&token=' + token

http_header = [

'Content-Type: audio/pcm; rate=8000',

'Content-Length: %d' % f_len

]

c = pycurl.Curl()

c.setopt(pycurl.URL, str(srv_url)) #curl doesn't support unicode

#c.setopt(c.RETURNTRANSFER, 1)

c.setopt(c.HTTPHEADER, http_header) #must be list, not dict

c.setopt(c.POST, 1)

c.setopt(c.CONNECTTIMEOUT, 30)

c.setopt(c.TIMEOUT, 30)

c.setopt(c.WRITEFUNCTION, dump_res)

c.setopt(c.POSTFIELDS, audio_data)

c.setopt(c.POSTFIELDSIZE, f_len)

c.perform() #pycurl.perform() has no return val

# 將data中的資料儲存到名為filename的WAV檔案中

def save_wave_file(filename, data):

wf = wave.open(filename, 'wb')

wf.setnchannels(1)

wf.setsampwidth(2)

wf.setframerate(SAMPLING_RATE)

wf.writeframes("".join(data))

wf.close()

NUM_SAMPLES = 2000 # pyAudio內部快取的塊的大小

SAMPLING_RATE = 8000 # 取樣頻率

LEVEL = 1500 # 聲音儲存的閾值

COUNT_NUM = 20 # NUM_SAMPLES個取樣之內出現COUNT_NUM個大於LEVEL的取樣則記錄聲音

SAVE_LENGTH = 8 # 聲音記錄的最小長度：SAVE_LENGTH * NUM_SAMPLES 個取樣

# 開啟聲音輸入ｐｙａｕｄｉｏ物件

pa = PyAudio()

stream = pa.open(format=paInt16, channels=1, rate=SAMPLING_RATE, input=True,

frames_per_buffer=NUM_SAMPLES)

token = get_token()#獲取ｔｏｋｅｎ

key = '05ba411481c8cfa61b91124ef7389767'　#ｋｅｙ和ａｐｉ的設定

api = 'http://www.tuling123.com/openapi/api?key=' + key + '&info='

while True:

# 讀入NUM_SAMPLES個取樣

string_audio_data = stream.read(NUM_SAMPLES)

# 將讀入的資料轉換為陣列

audio_data = np.fromstring(string_audio_data, dtype=np.short)

# 計算大於LEVEL的取樣的個數

large_sample_count = np.sum( audio_data > LEVEL )

temp = np.max(audio_data)

if temp > 2000 and t == 0:

t = 1#開啟錄音

print "檢測到訊號，開始錄音,計時五秒"

begin = time.time()

print temp

if t:

print np.max(audio_data)

if np.max(audio_data)<1000:

sum += 1

print sum

end = time.time()

if end-begin>5:

time_flag = 1

print "五秒到了，準備結束"

# 如果個數大於COUNT_NUM，則至少儲存SAVE_LENGTH個塊

if large_sample_count > COUNT_NUM:

save_count = SAVE_LENGTH

else:

save_count -= 1

if save_count < 0:

save_count = 0

if save_count > 0:

# 將要儲存的資料存放到save_buffer中

save_buffer.append(string_audio_data )

else:

# 將save_buffer中的資料寫入WAV檔案，WAV檔案的檔名是儲存的時刻

#if time_flag:

if len(save_buffer) > 0 or time_flag:

#filename = datetime.now().strftime("%Y-%m-%d_%H_%M_%S") + ".wav"#原本是用時間做名字

filename = str(flag_num)+".wav"

flag_num += 1

save_wave_file(filename, save_buffer)

save_buffer = []

t = 0

sum =0

time_flag = 0

print filename, "儲存成功正在進行語音識別"

use_cloud(token)

print duihua

info = duihua

duihua = ""

request = api + info

response = getHtml(request)

dic_json = json.loads(response)

#print '機器人: '.decode('utf-8') + dic_json['text']#這裡麻煩的是字元編碼

#huida = ' '.decode('utf-8') + dic_json['text']

a = dic_json['text']

print type(a)

unicodestring = a

# 將Unicode轉化為普通Python字串："encode"

utf8string = unicodestring.encode("utf-8")

print type(utf8string)

print str(a)

url = "http://tsn.baidu.com/text2audio?tex="+dic_json['text']+"&lan=zh&per=0&pit=1&spd=7&cuid=7519663&ctp=1&tok=24.a5f341cf81c523356c2307b35603eee6.2592000.1464423912.282335-7519663"

os.system('mpg123 "%s"'%(url))#用ｍｐｇ１２３來播放

７：主要ｂｕｇ解析

這裡算是解析一下主要坑的地方．除了環境因素，就是中文編碼，還有物件解析了．原始碼中從百度語音識別出來返回的是一個字典物件，而字典物件中有部分是直接一個字串，有的則是陣列，首先得讀出字串來確定是否是ｓｕｃｃｅｅｓ．然後再讀取ｔｅｘｔ陣列．中的中文．

另外一個ｂｕｇ是中文編碼．要這麼處理

import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

#還有

#print '機器人: '.decode('utf-8') + dic_json['text']
#huida = ' '.decode('utf-8') + dic_json['text']
a = dic_json['text']
print type(a)
unicodestring = a

# 將Unicode轉化為普通Python字串："encode"
utf8string = unicodestring.encode("utf-8")

import sys

reload(sys)

sys.setdefaultencoding( "utf-8" )

#還有

#print '機器人: '.decode('utf-8') + dic_json['text']

#huida = ' '.decode('utf-8') + dic_json['text']

a = dic_json['text']

print type(a)

unicodestring = a

# 將Unicode轉化為普通Python字串："encode"

utf8string = unicodestring.encode("utf-8")

然後移植到樹莓派上出現的主要問題是有ａｅｒｃｏｄｅ命令出現檔案目錄找不到．那麼說明是你音效卡選擇錯了，錄音聲音太小了也是，使用alsamixer選擇清楚．

還有錄音識別效率問題，問題主要集中在百度有他的要求，所以得設定１６ｂｉｔ．然後再聽一遍錄製的聲音，看看音量會不會太大，，有沒有很粗糙的聲音．最好能分開測試

８：原始碼－樹莓派環境下

ｐｙａｕｄｉｏ錯誤得我不要不要的，，所以還是繞開，使用ａｅｒｃｏｄｅ進行錄音命令，然後ｐｙｔｈｏｎ進行掉用．．程式碼也短很多，但是失去了實時處理音波的能力．

# -*- coding: utf-8 -*-
from pyaudio import PyAudio, paInt16
import numpy as np
from datetime import datetime
import wave
import time
import urllib, urllib2, pycurl
import base64
import json
import os
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

save_count = 0
save_buffer = []
t = 0
sum = 0
time_flag = 0
flag_num = 0
filename = '2.wav'
duihua = '1'

def getHtml(url):
    page = urllib.urlopen(url)
    html = page.read()
    return html

def get_token():
    apiKey = "Ll0c53MSac6GBOtpg22ZSGAU"
    secretKey = "44c8af396038a24e34936227d4a19dc2"
    auth_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=" + apiKey + "&client_secret=" + secretKey;
    res = urllib2.urlopen(auth_url)
    json_data = res.read()
    return json.loads(json_data)['access_token']

def dump_res(buf):
    global duihua
    print "字串型別"
    print (buf)
    a = eval(buf)
    print type(a)
    if a['err_msg']=='success.':
        #print a['result'][0]#終於搞定了，在這裡可以輸出，返回的語句
        duihua = a['result'][0]
        print duihua

def use_cloud(token):
    fp = wave.open(filename, 'rb')
    nf = fp.getnframes()
    f_len = nf * 2
    audio_data = fp.readframes(nf)
    cuid = "7519663" #產品id
    srv_url = 'http://vop.baidu.com/server_api' + '?cuid=' + cuid + '&token=' + token
    http_header = [
        'Content-Type: audio/pcm; rate=8000',
        'Content-Length: %d' % f_len
    ]

    c = pycurl.Curl()
    c.setopt(pycurl.URL, str(srv_url)) #curl doesn't support unicode
    #c.setopt(c.RETURNTRANSFER, 1)
    c.setopt(c.HTTPHEADER, http_header)   #must be list, not dict
    c.setopt(c.POST, 1)
    c.setopt(c.CONNECTTIMEOUT, 30)
    c.setopt(c.TIMEOUT, 30)
    c.setopt(c.WRITEFUNCTION, dump_res)
    c.setopt(c.POSTFIELDS, audio_data)
    c.setopt(c.POSTFIELDSIZE, f_len)
    c.perform() #pycurl.perform() has no return val

# 將data中的資料儲存到名為filename的WAV檔案中
def save_wave_file(filename, data):
    wf = wave.open(filename, 'wb')
    wf.setnchannels(1)
    wf.setsampwidth(2)
    wf.setframerate(SAMPLING_RATE)
    wf.writeframes("".join(data))
    wf.close()

token = get_token()
key = '05ba411481c8cfa61b91124ef7389767'
api = 'http://www.tuling123.com/openapi/api?key=' + key + '&info='

while(True):
    os.system('arecord -D "plughw:1,0" -f S16_LE -d 5 -r 8000 /home/luyi/yuyinduihua/2.wav')
    use_cloud(token)
    print duihua
    info = duihua
    duihua = ""
    request = api   + info
    response = getHtml(request)
    dic_json = json.loads(response)

    a = dic_json['text']
    print type(a)
    unicodestring = a

    # 將Unicode轉化為普通Python字串："encode"
    utf8string = unicodestring.encode("utf-8")

    print type(utf8string)
    print str(a)
    url = "http://tsn.baidu.com/text2audio?tex="+dic_json['text']+"&lan=zh&per=0&pit=1&spd=7&cuid=7519663&ctp=1&tok=24.a5f341cf81c523356c2307b35603eee6.2592000.1464423912.282335-7519663"
    os.system('mpg123 "%s"'%(url))

100

101

102

103

104

105

# -*- coding: utf-8 -*-

from pyaudio import PyAudio, paInt16

import numpy as np

from datetime import datetime

import wave

import time

import urllib, urllib2, pycurl

import base64

import json

import os

import sys

reload(sys)

sys.setdefaultencoding( "utf-8" )

save_count = 0

save_buffer = []

t = 0

sum = 0

time_flag = 0

flag_num = 0

filename = '2.wav'

duihua = '1'

def getHtml(url):

page = urllib.urlopen(url)

html = page.read()

return html

def get_token():

apiKey = "Ll0c53MSac6GBOtpg22ZSGAU"

secretKey = "44c8af396038a24e34936227d4a19dc2"

auth_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=" + apiKey + "&client_secret=" + secretKey;

res = urllib2.urlopen(auth_url)

json_data = res.read()

return json.loads(json_data)['access_token']

def dump_res(buf):

global duihua

print "字串型別"

print (buf)

a = eval(buf)

print type(a)

if a['err_msg']=='success.':

#print a['result'][0]#終於搞定了，在這裡可以輸出，返回的語句

duihua = a['result'][0]

print duihua

def use_cloud(token):

fp = wave.open(filename, 'rb')

nf = fp.getnframes()

f_len = nf * 2

audio_data = fp.readframes(nf)

cuid = "7519663" #產品id

srv_url = 'http://vop.baidu.com/server_api' + '?cuid=' + cuid + '&token=' + token

http_header = [

'Content-Type: audio/pcm; rate=8000',

'Content-Length: %d' % f_len

]

c = pycurl.Curl()

c.setopt(pycurl.URL, str(srv_url)) #curl doesn't support unicode

#c.setopt(c.RETURNTRANSFER, 1)

c.setopt(c.HTTPHEADER, http_header) #must be list, not dict

c.setopt(c.POST, 1)

c.setopt(c.CONNECTTIMEOUT, 30)

c.setopt(c.TIMEOUT, 30)

c.setopt(c.WRITEFUNCTION, dump_res)

c.setopt(c.POSTFIELDS, audio_data)

c.setopt(c.POSTFIELDSIZE, f_len)

c.perform() #pycurl.perform() has no return val

# 將data中的資料儲存到名為filename的WAV檔案中

def save_wave_file(filename, data):

wf = wave.open(filename, 'wb')

wf.setnchannels(1)

wf.setsampwidth(2)

wf.setframerate(SAMPLING_RATE)

wf.writeframes("".join(data))

wf.close()

token = get_token()

key = '05ba411481c8cfa61b91124ef7389767'

api = 'http://www.tuling123.com/openapi/api?key=' + key + '&info='

while(True):

os.system('arecord -D "plughw:1,0" -f S16_LE -d 5 -r 8000 /home/luyi/yuyinduihua/2.wav')

use_cloud(token)

print duihua

info = duihua

duihua = ""

request = api + info

response = getHtml(request)

dic_json = json.loads(response)

a = dic_json['text']

print type(a)

unicodestring = a

# 將Unicode轉化為普通Python字串："encode"

utf8string = unicodestring.encode("utf-8")

print type(utf8string)

print str(a)

url = "http://tsn.baidu.com/text2audio?tex="+dic_json['text']+"&lan=zh&per=0&pit=1&spd=7&cuid=7519663&ctp=1&tok=24.a5f341cf81c523356c2307b35603eee6.2592000.1464423912.282335-7519663"

os.system('mpg123 "%s"'%(url))

打賞支援我寫出更多好文章，謝謝！
打賞作者

打賞支援我寫出更多好文章，謝謝！

python語音智慧對話聊天機器人，linux&&樹莓派雙平臺相容

基於訊飛語音，百度語音，圖靈機器人樹莓派的智慧語音機器人mic
2018-10-15
圖靈機器人樹莓派
智慧聊天對話機器人的對比
2018-12-18
機器人
樹莓派語音互動--語音輸入識別
2020-11-24
樹莓派
樹莓派實戰：微信機器人（itchat實現）
2022-07-10
樹莓派機器人
樹莓派電臺
2015-10-24
樹莓派
中電金信推出智慧對話機器人管理平臺
2022-05-25
機器人
智慧語音電話機器人的優勢
2021-08-03
機器人
對話即平臺：利用人工智慧以及雲平臺打造你的智慧機器人
2018-04-12
人工智慧機器人
打造DIY版Echo：樹莓派+ Alexa 語音服務
2016-12-26
樹莓派
樹莓派：開機使用
2016-12-27
樹莓派
樹莓派搭建git伺服器 | 樹莓派小無相系列
2018-08-05
樹莓派Git伺服器
人機對話，不再尬聊：聊天機器人的未來
2019-02-13
機器人
5、樹莓派3 Model B ——— 樹莓派PWM控制直流電機速度
2017-06-07
樹莓派
樹莓派使用入門：如何更新樹莓派
2019-03-30
樹莓派
樹莓派搭建下載機
2021-07-19
樹莓派
樹莓派-感測器篇
2020-05-12
樹莓派
樹莓派是什麼樹莓派能做什麼樹莓派的功能用途
2021-11-04
樹莓派
相容樹莓派CM4定製產品
2022-06-08
樹莓派
用圖靈機器人實現的兩個機器人對話聊天
2015-11-18
圖靈機器人
微軟聊天機器人將淘汰選單語音機器人即將崛起？
2016-07-14
微軟機器人
兩個機器人聊天對話實現原始碼
2015-02-10
機器人原始碼
樹莓派使用
2021-12-13
樹莓派
【.NET 與樹莓派】控制舵機
2021-02-19
樹莓派
從零做樹莓派挖掘機
2017-07-30
樹莓派
樹莓派搭建FTP伺服器
2016-12-06
樹莓派FTP伺服器
樹莓派搭建私人伺服器
2017-01-23
樹莓派伺服器
樹莓派使用入門：用樹莓派學 Linux
2019-03-24
樹莓派Linux
使用樹莓派製作智慧小車
2020-10-10
樹莓派
python 樹莓派開機傳送IP到郵箱
2020-09-27
Python樹莓派
【樹莓派】Python開發工控機急停設計
2021-12-02
樹莓派Python
樹莓派使用入門：如何用樹莓派來娛樂
2019-03-31
樹莓派
樹莓派使用入門：如何購買一個樹莓派
2019-03-13
樹莓派
樹莓派使用入門：慶祝樹莓派的 14 天
2019-04-16
樹莓派
樹莓派CM4(四)：樹莓派映象替換核心
2024-08-27
樹莓派
PHP 和樹莓派開發一個比特幣 / 以太坊交易機器人
2018-09-13
PHP樹莓派比特幣機器人
樹莓派基金會：截止2015年2月樹莓派微型計算機銷量超500萬臺
2015-02-21
樹莓派計算機
樹莓派筆記
2018-11-16
樹莓派筆記
樹莓派與FileZilla
2020-11-14
樹莓派

python語音智慧對話聊天機器人，linux&&樹莓派雙平臺相容

０．目錄：

１．環境搭建

１．１：linux 版本

１．２：樹莓派版本

２：百度語音合成與識別

３：圖靈機器人

４：linux下使用pythonaudio進行音訊解析

５：樹莓派下使用arecord進行錄音

６：整體除錯ｌｉｎｕｘ平臺下的

７：主要ｂｕｇ解析

８：原始碼－樹莓派環境下

打賞支援我寫出更多好文章，謝謝！

相關文章

１．１：linux　版本