痞子衡嵌入式：語音處理工具Jays-PySPEECH誕生記（5）- 語音識別實現(SpeechRecognition, PocketSphinx0.1.15)

痞子衡發表於2017-06-18

原文網址 : https://www.cnblogs.com/henjay724/p/9576670.html

　　大家好，我是痞子衡，是正經搞技術的痞子。今天痞子衡給大家介紹的是語音處理工具Jays-PySPEECH誕生之語音識別實現。

　　語音識別是Jays-PySPEECH的核心功能，Jays-PySPEECH藉助的是SpeechRecognition系統以及CMU Sphinx引擎來實現的語音識別功能，今天痞子衡為大家介紹語音識別在Jays-PySPEECH中是如何實現的。

一、SpeechRecognition系統簡介

　　SpeechRecognition是一套基於python實現語音識別的系統，該系統的設計者為 Anthony Zhang (Uberi)，該庫從2014年開始推出，一直持續更新至今，Jays-PySPEECH使用的是SpeechRecognition 3.8.1。
　　SpeechRecognition系統的官方主頁如下：

SpeechRecognition官方主頁: https://github.com/Uberi/speech_recognition

SpeechRecognition安裝方法: https://pypi.org/project/SpeechRecognition/

　　SpeechRecognition系統自身並沒有語音識別功能，其主要是呼叫第三方語音識別引擎來實現語音識別，SpeechRecognition支援的語音識別引擎非常多，有如下8種：

CMU Sphinx (works offline)

Google Speech Recognition

Google Cloud Speech API

Wit.ai

Microsoft Bing Voice Recognition

Houndify API

IBM Speech to Text

Snowboy Hotword Detection (works offline)

　　不管是選用哪一種語音識別引擎，在SpeechRecognition裡呼叫介面都是一致的，我們以實現音訊檔案轉文字的示例程式碼 audio_transcribe.py 為例瞭解SpeechRecognition的用法，擷取audio_transcribe.py部分內容如下：

import speech_recognition as sr

# 指定要轉換的音訊原始檔（english.wav）
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")

# 定義SpeechRecognition物件並獲取音訊原始檔（english.wav）中的資料
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file

# 使用CMU Sphinx引擎去識別音訊
try:
    print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))

# 使用Microsoft Bing Voice Recognition引擎去識別音訊
BING_KEY = "INSERT BING API KEY HERE"  # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings
try:
    print("Microsoft Bing Voice Recognition thinks you said " + r.recognize_bing(audio, key=BING_KEY))
except sr.UnknownValueError:
    print("Microsoft Bing Voice Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))

# 使用其他引擎去識別音訊
# ... ...

　　有木有覺得SpeechRecognition使用起來特別簡單？是的，這正是SpeechRecognition系統強大之處，更多示例可見 https://github.com/Uberi/speech_recognition/tree/master/examples。

1.1 選用CMU Sphinx引擎

　　前面痞子衡講了SpeechRecognition系統自身並沒有語音識別功能，因此我們需要為SpeechRecognition安裝一款語音識別引擎，痞子衡為JaysPySPEECH選用的是可離線工作的CMU Sphinx。
　　CMU Sphinx是卡內基梅隆大學開發的一款開源語音識別引擎，該引擎可以離線工作，並且支援多語種（英語、中文、法語等）。CMU Sphinx引擎的官方主頁如下：

CMU Sphinx官方主頁: https://cmusphinx.github.io/

CMU Sphinx官方下載: https://sourceforge.net/projects/cmusphinx/

　　由於JaysPySPEECH是基於Python環境開發的，因此我們不能直接用CMU Sphinx，那該怎麼辦？彆著急，Dmitry Prazdnichnov大牛為CMU Sphinx寫了Python封裝介面，即PocketSphinx，其官方主頁如下：

PocketSphinx官方主頁: https://github.com/bambocher/pocketsphinx-python

PocketSphinx安裝方法: https://pypi.org/project/pocketsphinx/

　　我們在JaysPySPEECH誕生系列文章第一篇環境搭建裡已經安裝了SpeechRecognition和PocketSphinx，痞子衡的安裝路徑為C:\tools_mcu\Python27\Lib\site-packages下的\speech_recognition與\pocketsphinx，安裝好這兩個包，引擎便選好了。

1.2 為PocketSphinx引擎增加中文語言包

　　預設情況下，PocketSphinx僅支援US English語言的識別，在C:\tools_mcu\Python27\Lib\site-packages\speech_recognition\pocketsphinx-data目錄下僅能看到en-US資料夾，先來看一下這個資料夾裡有什麼:

\pocketsphinx-data\en-US
                        \acoustic-model                     --聲學模型
                                       \feat.params           --HMM模型的特徵引數
                                       \mdef                  --模型定義檔案
                                       \means                 --混合高斯模型的均值
                                       \mixture_weights       --混合權重
                                       \noisedict             --噪聲也就是非語音字典
                                       \sendump               --從聲學模型中獲取混合權重
                                       \transition_matrices   --HMM模型的狀態轉移矩陣
                                       \variances             --混合高斯模型的方差
                        \language-model.lm.bin              --語言模型
                        \pronounciation-dictionary.dict     --拼音字典

　　看到這一堆檔案是不是覺得有點難懂？這其實跟CMU Sphinx引擎的語音識別原理有關，此處我們暫且不深入瞭解，對我們呼叫API的應用來說只需要關於如何為CMU Sphinx增加其他語言包（比如中文包）。
　　要想增加其他語言，首先得要有語言包資料，CMU Sphinx主頁提供了12種主流語言包的下載 https://sourceforge.net/projects/cmusphinx/files/Acoustic_and_Language_Models/，因為JaysPySPEECH需要支援中文識別，因此我們需要下載\Mandarin下面的三個檔案：

\Mandarin
         \zh_broadcastnews_16k_ptm256_8000.tar.bz2  --聲學模型
         \zh_broadcastnews_64000_utf8.DMP           --語言模型
         \zh_broadcastnews_utf8.dic                 --拼音字典

　　有了中文語言包資料，然後我們需要根據 Notes on using PocketSphinx 裡指示的步驟操作，痞子衡整理如下：

\speech_recognition\pocketsphinx-data目錄下建立zh-CN資料夾

將zh_broadcastnews_16k_ptm256_8000.tar.bz2解壓縮並裡面所有檔案放入\zh-CN\acoustic-model資料夾下

將zh_broadcastnews_utf8.dic重新命名為pronounciation-dictionary.dict並放入\zh-CN資料夾下

藉助SphinxBase工具將zh_broadcastnews_64000_utf8.DMP轉換成language-model.lm.bin並放入\zh-CN資料夾下

　　關於第4步裡提到的SphinxBase工具，我們需要從 https://github.com/cmusphinx/sphinxbase 裡下載原始碼，然後使用Visual Studio 2010（或以上）開啟\sphinxbase\sphinxbase.sln工程Rebuild All後會在\sphinxbase\bin\Release\x64下看到生成了如下6個工具：

\\sphinxbase\bin\Release\x64
                            \sphinx_cepview.exe
                            \sphinx_fe.exe
                            \sphinx_jsgf2fsg.exe
                            \sphinx_lm_convert.exe
                            \sphinx_pitch.exe
                            \sphinx_seg.exe

　　我們主要使用sphinx_lm_convert.exe工具完成轉換工作生成language-model.lm.bin，具體命令如下：

PS C:\tools_mcu\sphinxbase\bin\Release\x64> .\sphinx_lm_convert.exe -i .\zh_broadcastnews_64000_utf8.DMP -o language-model.lm - ofmt arpa

Current configuration:
[NAME]          [DEFLT] [VALUE]
-case
-help           no      no
-i                      .\zh_broadcastnews_64000_utf8.DMP
-ifmt
-logbase        1.0001  1.000100e+00
-mmap           no      no
-o                      language-model.lm
-ofmt                   arpa

INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't match
INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
INFO: ngram_model_trie.c(70): No \data\ mark in LM file
INFO: ngram_model_trie.c(445): Trying to read LM in dmp format
INFO: ngram_model_trie.c(527): ngrams 1=63944, 2=16600781, 3=20708460
INFO: lm_trie.c(474): Training quantizer
INFO: lm_trie.c(482): Building LM trie

PS C:\tools_mcu\sphinxbase\bin\Release\x64> .\sphinx_lm_convert.exe -i .\language-model.lm -o language-model.lm.bin

Current configuration:
[NAME]          [DEFLT] [VALUE]
-case
-help           no      no
-i                      .\language-model.lm
-ifmt
-logbase        1.0001  1.000100e+00
-mmap           no      no
-o                      language-model.lm.bin
-ofmt

INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't match
INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
INFO: ngram_model_trie.c(193): LM of order 3
INFO: ngram_model_trie.c(195): #1-grams: 63944
INFO: ngram_model_trie.c(195): #2-grams: 16600781
INFO: ngram_model_trie.c(195): #3-grams: 20708460
INFO: lm_trie.c(474): Training quantizer
INFO: lm_trie.c(482): Building LM trie

二、Jays-PySPEECH語音識別實現

　　語音識別程式碼實現其實很簡單，直接呼叫speech_recognition裡的API即可，目前僅實現了CMU Sphinx引擎，並且僅支援中英雙語識別。具體到Jays-PySPEECH上主要是實現GUI介面上"ASR"按鈕的回撥函式，即audioSpeechRecognition()，如果使用者選定了配置引數（語言型別、ASR引擎型別），並點選了"ASR"按鈕，此時便會觸發audioSpeechRecognition()的執行。程式碼如下：

import speech_recognition

class mainWin(win.speech_win):

    def getLanguageSelection(self):
        languageType = self.m_choice_lang.GetString(self.m_choice_lang.GetSelection())
        if languageType == 'Mandarin Chinese':
            languageType = 'zh-CN'
            languageName = 'Chinese'
        else: # languageType == 'US English':
            languageType = 'en-US'
            languageName = 'English'
        return languageType, languageName

    def audioSpeechRecognition( self, event ):
        if os.path.isfile(self.wavPath):
            # 建立speech_recognition語音識別物件asrObj
            asrObj = speech_recognition.Recognizer()
            # 獲取wav檔案裡的語音內容
            with speech_recognition.AudioFile(self.wavPath) as source:
                speechAudio = asrObj.record(source)
            self.m_textCtrl_asrttsText.Clear()
            # 獲取語音語言型別（English/Chinese）
            languageType, languageName = self.getLanguageSelection()
            engineType = self.m_choice_asrEngine.GetString(self.m_choice_asrEngine.GetSelection())
            if engineType == 'CMU Sphinx':
                try:
                    # 呼叫recognize_sphinx完成語音識別
                    speechText = asrObj.recognize_sphinx(speechAudio, language=languageType)
                    # 語音識別結果顯示在asrttsText文字框內
                    self.m_textCtrl_asrttsText.write(speechText)
                    self.statusBar.SetStatusText("ASR Conversation Info: Successfully")
                    # 語音識別結果寫入指定檔案
                    fileName = self.m_textCtrl_asrFileName.GetLineText(0)
                    if fileName == '':
                        fileName = 'asr_untitled1.txt'
                    asrFilePath = os.path.join(os.path.dirname(os.path.abspath(os.path.dirname(__file__))), 'conv', 'asr', fileName)
                    asrFileObj = open(asrFilePath, 'wb')
                    asrFileObj.write(speechText)
                    asrFileObj.close()
                except speech_recognition.UnknownValueError:
                    self.statusBar.SetStatusText("ASR Conversation Info: Sphinx could not understand audio")
                except speech_recognition.RequestError as e:
                    self.statusBar.SetStatusText("ASR Conversation Info: Sphinx error; {0}".format(e))
            else:
                self.statusBar.SetStatusText("ASR Conversation Info: Unavailable ASR Engine")

　　至此，語音處理工具Jays-PySPEECH誕生之語音識別實現痞子衡便介紹完畢了，掌聲在哪裡~~~

用python呼叫百度語音識別api批量處理本地語音檔案
2020-11-08
PythonAPI
語音識別模型
2024-10-29
模型
SpeechRecognition麥克風語言識別
2024-07-21
30分鐘實現小程式語音識別
2018-11-24
JavaScript的語音識別
2018-08-25
JavaScript
語音識別技術
2018-03-04
語音情感識別--RNN
2021-09-09
RNN
5 款不錯的開源語音識別/語音文字轉換系統
2019-06-22
樹莓派語音互動--語音輸入識別
2020-11-24
樹莓派
小程式實現語音識別到底要填多少坑？
2019-02-16
python語音識別入門及實踐
2018-07-16
Python
智慧耳機上演“神仙打架”：語音互動新寵誕生記
2019-04-27
語音處理加窗分幀
2023-04-28
揭秘語音識別演算法的神奇之處
2024-08-28
演算法
谷歌再獲語音識別新進展：利用序列轉導來實現多人語音識別和說話人分類
2019-08-28
谷歌
微信小程式語音同步智慧識別的實現案例
2020-05-29
微信小程式
論文筆記：語音情感識別（五）語音特徵集之eGeMAPS，ComParE，09IS，BoAW
2018-12-22
筆記特徵
Swift-語音識別、翻譯
2019-03-20
Swift
Python語音識別終極指南
2018-04-11
Python
人工智慧 (08) 語音識別
2019-12-22
人工智慧
ASR-使用whisper語音識別
2024-10-23
win10系統語音聲音小怎麼設定 win10系統語音聲音特別小處理方法
2020-11-20
Win10
語音喚醒實現
2019-05-05
快商通首席科學家：語音識別的後半段路，從語言處理走向語言理解
2019-09-27
aardio實現語音閱讀文字【包含選擇語音庫】
2024-08-02
人工智慧之語音識別(ASR)
2019-08-07
人工智慧
[譯] 使用 WFST 進行語音識別
2019-05-12
百度API---語音識別
2020-12-19
API
語音識別2 -- Listen,Attend,and Spell (LAS)
2020-11-25
深度瞭解語音識別之發音詞典及語音資料採集標註
2023-02-03
基於語音識別的會議記錄系統
2024-07-18
痞子衡嵌入式：嵌入式裡串列埠(UART)自動波特率識別程式設計與實現
2021-06-12
串列埠程式設計
語音轉文字工具，語音轉文字怎樣轉？
2019-06-12
微信小程式使用同聲傳譯實現語音識別功能
2021-06-02
微信小程式
C# 實現語音聊天
2021-02-01
C#
怎麼關閉win10語音識別 win10如何關閉電腦的語音識別
2020-10-13
Win10
痞子衡嵌入式：微處理器CPU效能測試基準(Dhrystone)
2019-05-13
新一代 Kaldi: 支援 JavaScript 進行本地語音識別和語音合成啦！
2024-03-17
JavaScript
Transformers.js實現瀏覽器內WebGPU加速的實時語音識別
2024-06-08
ORMJS瀏覽器WebGPU

痞子衡嵌入式：語音處理工具Jays-PySPEECH誕生記（5）- 語音識別實現(SpeechRecognition, PocketSphinx0.1.15)

一、SpeechRecognition系統簡介

1.1 選用CMU Sphinx引擎

1.2 為PocketSphinx引擎增加中文語言包

二、Jays-PySPEECH語音識別實現

相關文章