痞子衡嵌入式:語音處理工具Jays-PySPEECH誕生記(3)- 音訊顯示實現(Matplotlib, NumPy1.15.0)

痞子衡發表於2017-06-11

  大家好,我是痞子衡,是正經搞技術的痞子。今天痞子衡給大家介紹的是語音處理工具Jays-PySPEECH誕生之音訊顯示實現

  音訊顯示是Jays-PySPEECH的主要功能,Jays-PySPEECH藉助的是Matplotlib以及NumPy來實現的音訊顯示功能,今天痞子衡為大家介紹音訊顯示在Jays-PySPEECH中是如何實現的。

一、SciPy工具集

  SciPy是一套Python科學計算相關的工具集,其本身也是一個Python庫,這個工具集主要包含以下6大Python庫,Jays-PySPEECH所用到的Matplotlib以及NumPy均屬於SciPy工具集。

痞子衡嵌入式:語音處理工具Jays-PySPEECH誕生記(3)- 音訊顯示實現(Matplotlib, NumPy1.15.0)

1.1 NumPy

  NumPy是一套最基礎的Python科學計算包,它主要用於陣列與矩陣運算,它是一個開源專案,被收錄進 NumFOCUS 組織維護的 Sponsored Project 裡。Jays-PySPEECH使用的是NumPy 1.15.0。
  NumPy庫的官方主頁如下:

  NumPy的快速上手可參考這個網頁 https://docs.scipy.org/doc/numpy/user/quickstart.html

1.2 Matplotlib

  Matplotlib是一套Python高質量2D繪相簿,它的初始設計者為John Hunter,它也是一個開源專案,被同樣收錄進 NumFOCUS 組織維護的 Sponsored Project 裡。Jays-PySPEECH使用的是Matplotlib 2.2.3。
  Matplotlib庫的官方主頁如下:

  Matplotlib繪圖功能非常強大,但是作為一般使用,我們沒有必要去通讀其官方文件,其提供了非常多的example程式碼,這些example都在 https://matplotlib.org/gallery/index.html, 我們只要找到能滿足我們需求的example,在其基礎上簡單修改即可。 下面就是一個最簡單的正弦波示例:

痞子衡嵌入式:語音處理工具Jays-PySPEECH誕生記(3)- 音訊顯示實現(Matplotlib, NumPy1.15.0)

import matplotlib
import matplotlib.pyplot as plt
import numpy as np

# Data for plotting
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)

fig, ax = plt.subplots()
ax.plot(t, s)

ax.set(xlabel='time (s)', ylabel='voltage (mV)',
       title='About as simple as it gets, folks')
ax.grid()

fig.savefig("test.png")
plt.show()

二、Jays-PySPEECH音訊顯示實現

  Jays-PySPEECH關於音訊顯示功能實現主要有四點:選擇.wav檔案、讀取.wav檔案、繪製.wav波形、新增游標功能,最終Jays-PySPEECH效果如下圖所示,痞子衡為逐一為大家介紹實現細節。

痞子衡嵌入式:語音處理工具Jays-PySPEECH誕生記(3)- 音訊顯示實現(Matplotlib, NumPy1.15.0)

2.1 選擇.wav檔案功能

  選擇wav檔案主要藉助的是wxPython裡的genericDirCtrl控制元件提供的功能實現的,我們使用genericDirCtrl控制元件建立了一個名為m_genericDirCtrl_audioDir的物件,藉助其SetFilter()方法實現了僅顯示.wav檔案格式的過濾,並且我們為m_genericDirCtrl_audioDir還建立了一個event,即viewAudio(),這個event的觸發條件是選中m_genericDirCtrl_audioDir裡列出的.wav檔案,當viewAudio()被觸發時,我們通過GetFilePath()方法即可獲得選中的.wav檔案路徑。

class mainWin(win.speech_win):

    def __init__(self, parent):
        win.speech_win.__init__(self, parent)
        # ...
        self.m_genericDirCtrl_audioDir.SetFilter("Audio files (*.wav)|*.wav")

    def viewAudio( self, event ):
        self.wavPath =  self.m_genericDirCtrl_audioDir.GetFilePath()

2.2 讀取.wav檔案功能

  讀取.wav檔案主要藉助的是python自帶的標準庫wave,以及第三方的NumPy庫。痞子衡建立了一個名為wavCanvasPanel的類,在這個類中定義了readWave(self, wavPath, wavInfo)方法,其中引數wavPath即是要讀取的.wav檔案路徑,引數wavInfo是GUI狀態列物件,用於直觀顯示讀取到的.wav檔案資訊。
  在wavCanvasPanel.readWave()方法中,痞子衡首先使用了wave庫裡的功能獲取到.wav檔案的所有資訊以及所有PCM資料,然後藉助NumPy庫將PCM資料按channel重新組織,便於後續圖形顯示。關於資料重新組織,有一個地方需要特別說明,即int24型別(3-byte)是不被NumPy中的fromstring()原生支援,因此痞子衡自己實現了一個非標準型別資料的fromstring()。

import numpy
import wave

class wavCanvasPanel(wx.Panel):

    def fromstring(self, wavData, alignedByte):
        if alignedByte <= 8:
            src = numpy.ndarray(len(wavData), numpy.dtype('>i1'), wavData)
            dest = numpy.zeros(len(wavData) / alignedByte, numpy.dtype('>i8'))
            for i in range(alignedByte):
                dest.view(dtype='>i1')[alignedByte-1-i::8] = src.view(dtype='>i1')[i::alignedByte]
            [hex(x) for x in dest]
            return True, dest
        else:
            return False, wavData

    def readWave(self, wavPath, wavInfo):
        if os.path.isfile(wavPath):
            # Open the wav file to get wave data and parameters
            wavFile =  wave.open(wavPath, "rb")
            wavParams = wavFile.getparams()
            wavChannels = wavParams[0]
            wavSampwidth = wavParams[1]
            wavFramerate = wavParams[2]
            wavFrames = wavParams[3]
            wavInfo.SetStatusText('Opened Audio Info = ' +
                                  'Channels:' + str(wavChannels) +
                                  ', SampWidth:' + str(wavSampwidth) + 'Byte' +
                                  ', SampRate:' + str(wavFramerate) + 'kHz' +
                                  ', FormatTag:' + wavParams[4])
            wavData = wavFile.readframes(wavFrames)
            wavFile.close()
            # Transpose the wav data if wave has multiple channels
            if wavSampwidth == 1:
                dtype = numpy.int8
            elif wavSampwidth == 2:
                dtype = numpy.int16
            elif wavSampwidth == 3:
                dtype = None
            elif wavSampwidth == 4:
                dtype = numpy.float32
            else:
                return 0, 0, 0
            if dtype != None:
                retData = numpy.fromstring(wavData, dtype = dtype)
            else:
                # Implement int24 manually
                status, retData = self.fromstring(wavData, 3)
                if not status:
                    return 0, 0, 0
            if wavChannels != 1:
                retData.shape = -1, wavChannels
                retData = retData.T
            # Calculate and arange wave time
            retTime = numpy.arange(0, wavFrames) * (1.0 / wavFramerate)
            retChannels = wavChannels
            return retChannels, retData, retTime
        else:
            return 0, 0, 0

2.3 繪製.wav波形功能

  繪製.wav波形是最主要的功能。痞子衡在wavCanvasPanel類中實現了showWave(self, wavPath, wavInfo)方法,這個方法會在GUI控制元件m_genericDirCtrl_audioDir的事件函式viewAudio()中被呼叫。
  在wavCanvasPanel.showWave()方法中,痞子衡首先使用了readWave()獲取.wav檔案中經過重新組織的PCM資料,然後藉助Matplotlib中的figure類中的add_axes()方法逐一將各channel的PCM資料繪製出來,並輔以各種資訊(x、y軸精度、標籤等)一同顯示出來。由於GUI控制元件裡專門用於顯示波形的Panel物件尺寸為720*360 inch,痞子衡限制了最多顯示.wav的前8通道。

import matplotlib
from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg as FigureCanvas
from matplotlib.figure import Figure

MAX_AUDIO_CHANNEL = 8
#unit: inch
PLOT_PANEL_WIDTH = 720
PLOT_PANEL_HEIGHT = 360
#unit: percent
PLOT_AXES_WIDTH_TITLE = 0.05
PLOT_AXES_HEIGHT_LABEL = 0.075

class wavCanvasPanel(wx.Panel):

    def __init__(self, parent):
        wx.Panel.__init__(self, parent)
        dpi = 60
        width = PLOT_PANEL_WIDTH / dpi
        height = PLOT_PANEL_HEIGHT / dpi
        self.wavFigure = Figure(figsize=[width,height], dpi=dpi, facecolor='#404040')
        self.wavCanvas = FigureCanvas(self, -1, self.wavFigure)
        self.wavSizer = wx.BoxSizer(wx.VERTICAL)
        self.wavSizer.Add(self.wavCanvas, 1, wx.EXPAND|wx.ALL)
        self.SetSizerAndFit(self.wavSizer)
        self.wavAxes = [None] * MAX_AUDIO_CHANNEL

    def readWave(self, wavPath, wavInfo):
        # ...

    def showWave(self, wavPath, wavInfo):
        self.wavFigure.clear()
        waveChannels, waveData, waveTime = self.readWave(wavPath, wavInfo)
        if waveChannels != 0:
            # Note: only show max supported channel if actual channel > max supported channel
            if waveChannels > MAX_AUDIO_CHANNEL:
                waveChannels = MAX_AUDIO_CHANNEL
            # Polt the waveform of each channel in sequence
            for i in range(waveChannels):
                left = PLOT_AXES_HEIGHT_LABEL
                bottom = (1.0 / waveChannels) * (waveChannels - 1 - i) + PLOT_AXES_HEIGHT_LABEL
                height = 1.0 / waveChannels - (PLOT_AXES_WIDTH_TITLE + PLOT_AXES_HEIGHT_LABEL)
                width = 1 - left - 0.05
                self.wavAxes[i] = self.wavFigure.add_axes([left, bottom, width, height], facecolor='k')
                self.wavAxes[i].set_prop_cycle(color='#00F279', lw=[1])
                self.wavAxes[i].set_xlabel('time (s)', color='w')
                self.wavAxes[i].set_ylabel('value', color='w')
                if waveChannels == 1:
                    data = waveData
                else:
                    data = waveData[i]
                self.wavAxes[i].plot(waveTime, data)
                self.wavAxes[i].grid()
                self.wavAxes[i].tick_params(labelcolor='w')
                self.wavAxes[i].set_title('Audio Channel ' + str(i), color='w')
        # Note!!!: draw() must be called if figure has been cleared once
        self.wavCanvas.draw()

class mainWin(win.speech_win):

    def __init__(self, parent):
        win.speech_win.__init__(self, parent)
        self.wavPanel = wavCanvasPanel(self.m_panel_plot)
        # ...

    def viewAudio( self, event ):
        self.wavPath =  self.m_genericDirCtrl_audioDir.GetFilePath()
        self.wavPanel.showWave(self.wavPath, self.statusBar)

2.4 新增游標功能

  游標定位功能不是必要功能,但其可以讓軟體看起來高大上,痞子衡建立了一個名為wavCursor類來實現它,主要在這個類中實現了moveMouse方法,這個方法將會被FigureCanvasWxAgg類中的mpl_connect()方法新增到各通道axes中。

MAX_AUDIO_CHANNEL = 8

class wavCursor(object):
    def __init__(self, ax, x, y):
        self.ax = ax
        self.vline = ax.axvline(color='r', alpha=1)
        self.hline = ax.axhline(color='r', alpha=1)
        self.marker, = ax.plot([0],[0], marker="o", color="crimson", zorder=3)
        self.x = x
        self.y = y
        self.xlim = self.x[len(self.x)-1]
        self.text = ax.text(0.7, 0.9, '', bbox=dict(facecolor='red', alpha=0.5))

    def moveMouse(self, event):
        if not event.inaxes:
            return
        x, y = event.xdata, event.ydata
        if x > self.xlim:
            x = self.xlim
        index = numpy.searchsorted(self.x, [x])[0]
        x = self.x[index]
        y = self.y[index]
        self.vline.set_xdata(x)
        self.hline.set_ydata(y)
        self.marker.set_data([x],[y])
        self.text.set_text('x=%1.2f, y=%1.2f' % (x, y))
        self.text.set_position((x,y))
        self.ax.figure.canvas.draw_idle()

class wavCanvasPanel(wx.Panel):
    def __init__(self, parent):
        # ...
        self.wavAxes = [None] * MAX_AUDIO_CHANNEL
        # 定義游標物件
        self.wavCursor = [None] * MAX_AUDIO_CHANNEL

    def showWave(self, wavPath, wavInfo):
        # ...
        if waveChannels != 0:
            # ...
            for i in range(waveChannels):
                # ...
                self.wavAxes[i].set_title('Audio Channel ' + str(i), color='w')
                # 例項化游標物件,並使用mpl_connect()將moveMouse()動作加入游標物件
                self.wavCursor[i] = wavCursor(self.wavAxes[i], waveTime, data)
                self.wavCanvas.mpl_connect('motion_notify_event', self.wavCursor[i].moveMouse)
        # ...

  至此,語音處理工具Jays-PySPEECH誕生之音訊顯示實現痞子衡便介紹完畢了,掌聲在哪裡~~~

參考文件

  1. Embedding a matplotlib figure inside a WxPython panel
  2. 軟妹子帶你玩轉Python資料視覺化Axes繪圖佈局方法介紹

相關文章