Spectral subtrative 譜減法降噪 python 程式碼

ColorLSu_forever發表於2020-12-24

原文網址 : https://blog.csdn.net/weixin_38747502/article/details/111602031

Python

最近在看speech enhancement 內容，看完譜減法部分後，在網上找相應的程式碼來看，然後將MATLAB程式碼轉成Python程式碼，順便學習一下Python的使用。

譜減法的基礎實現：

論文《Enhancement of speech corrupted by acoustic noise》提出的實現：

演算法流程如下：

效果如下：

這是一段火車站附近的錄音，噪聲比較平穩；設定的VAD閾值是3dB，一般應用上設定的是6dB

雖然floor值的存在填充了頻譜中一定的 “波谷”，但可以看到還是有一些音樂噪聲的存在。

程式碼段

Python程式碼如下：

#!/usr/bin/env python
# encoding: utf-8
'''
@author: Kolor
@license: 
@contact: colorsu1922@163.com
@software: pycharm
@file: spec_sub.py
@time: 2020/12/21 23:44
@desc:
  Implements the basic power spectral subtraction algorithm [1].

  Usage:  specsub(noisyFile, outputFile)

         noisyFile - noisy speech file in .wav format
         outputFile - enhanced output file in .wav format

   Algorithm uses the first 5 frames to estimate the noise psd, and
   then uses a very simple VAD algorithm for updating the noise psd.
   Alternatively, the VAD in the file ss_rdc.m (this folder) can be used.

  References:
   [1] Berouti, M., Schwartz, M., and Makhoul, J. (1979). Enhancement of speech
       corrupted by acoustic noise. Proc. IEEE Int. Conf. Acoust., Speech,
       Signal Processing, 208-211.

 Author: Philipos C. Loizou

 Copyright (c) 2006 by Philipos C. Loizou
 $Revision: 0.0 $  $Date: 10/09/2006 $
-------------------------------------------------------------------------
'''
from typing import Type

import librosa
import numpy as np
import sys
import matplotlib.pyplot as plt
import soundfile as sf


def rh_spec_sub(infile, outfile):
    print(np.__version__)
    x, samp_rate = librosa.load(infile, sr=8000, mono=True)
    # sys.exit("pause")
    # =============== Initialize variables ===============

    # Frame size in samples
    frame_len = int(np.floor(20 * samp_rate / 1000))
    if np.remainder(frame_len, 2) == 1:
        frame_len = frame_len + 1

    # window overlap in percent of frame size
    overlap_percent = 50
    len1 = int(np.floor(frame_len * overlap_percent / 100))
    len2 = int(frame_len - len1)

    # VAD threshold in dB SNR
    vad_thresh = 3
    # power exponent
    alpha = 2.0
    FLOOR = 0.002
    G = 0.9

    # define window
    win = np.hanning(frame_len+1)[0:frame_len]
    win_gain = len2 / np.sum(win)  # normalization gain for overlap+add with 50% overlap

    # Noise magnitude calculations - assuming that the first 5 frames is noise/silence

    NFFT = int(2 * 2 ** next_pow2(frame_len))
    noise_mean = np.zeros(NFFT)
    j = 0
    for k in range(0, 5):
        windowed = np.multiply(win, x[j:j+frame_len])
        noise_mean = np.add(noise_mean, np.abs(np.fft.fft(windowed, NFFT)))
        j = j + frame_len

    noise_mu = noise_mean / 5

    # --- allocate memory and initialize various variables
    k = 0
    img = 1j  # np.sqrt(-1)
    x_old = np.zeros(len1)
    Nframs = int(np.floor(len(x)/len2) - 1)
    xfinal = np.zeros(Nframs * len2)

    # ===============================  Start Processing ==================================
    for n in range(0, Nframs):
        insign = np.multiply(win, x[k:k+frame_len])
        spec = np.fft.fft(insign, NFFT)
        sig = np.abs(spec)  # compute the magnitude

        # save the noisy phase information
        theta = np.angle(spec)

        SNR_seg = 10*np.log10(np.linalg.norm(sig, 2)**2 / np.linalg.norm(noise_mu, 2)**2)

        if alpha == 1.0:
            beta = berouti1(SNR_seg)
        else:
            beta = berouti(SNR_seg)

        # &&&&&&&&&&&&&&&
        sub_speech = sig ** alpha - beta * noise_mu ** alpha
        diffw = sub_speech - FLOOR * noise_mu ** alpha

        # floor negative component
        z = np.argwhere(diffw < 0)
        if len(z) is not 0:
            sub_speech[z] = FLOOR * noise_mu[z] ** alpha

        # --- implement a simple VAD detector - -------------
        if SNR_seg < vad_thresh:   # Update noise spectrum
            noise_temp = G * noise_mu ** alpha + (1 - G) * sig ** alpha
            noise_mu = noise_temp ** (1 / alpha)  # new noise spectrum

        # to ensure conjugate symmetry for real reconstruction
        sub_speech[int(NFFT/2) + 1: NFFT] = np.flipud(sub_speech[1: int(NFFT/2)])

        x_phase = (sub_speech**(1/alpha)) * (np.cos(theta) + img * (np.sin(theta)))

        # take the iFFT
        xi = np.real(np.fft.ifft(x_phase))
        # plt.plot(xi)
        # plt.show()
        # --- Overlap and add ---------------
        xfinal[k:k+len2] = x_old + xi[0:len1]
        x_old = xi[len1:frame_len]
        k = k + len2

    # write output
    sf.write(outfile, win_gain * xfinal, samp_rate, 'PCM_16')



def berouti1(snr):
    beta = 0
    if -5.0 <= snr <= 20:
        beta = 3 - snr * 2 / 20
    elif snr < -5.0:
        beta = 4
    elif snr > 20:
        beta = 1

    return beta


def berouti(snr):
    beta = 0
    if -5.0 <= snr <= 20:
        beta = 4 - snr * 3 / 20
    elif snr < -5.0:
        beta = 5
    elif snr > 20:
        beta = 1

    return beta


def next_pow2(n):
    return np.ceil(np.log2(np.abs(n)))


def load_audio(filename, trace=0):
    """
    load wav file using audioread.
    This is not available in python x,y.
    """
    data = np.array([])
    with audioread.audio_open(filename) as af:
        trace_n = af.channels
        if trace >= trace_n:
            print('number of traces in file is', trace_n)
            quit()
        nsamp = np.ceil(af.samplerate * af.duration)
        print(f"nsamp =%d" % nsamp)
        data = np.ascontiguousarray(np.zeros(nsamp, 1))
        index = 0
        for buffer in af:
            full_data = np.fromstring(buffer).reshape(-1, af.channels)
            n = full_data.shape[0]
            if index + n > len(data):
                n = len(data) - index
            if n > 0:
                data[index:index + n] = full_data[:n, trace]
                index += n
            else:
                break

    return af.samplerate, data / 2.0 ** 15, 'a.u.'

上面程式碼是根據speech enhancement中的MATLAB程式碼改寫的，僅用作學習用途。

原始的MATLAB程式碼如下：

function specsub(filename,outfile)

%  Implements the basic power spectral subtraction algorithm [1].
% 
%  Usage:  specsub(noisyFile, outputFile)
%           
%         noisyFile - noisy speech file in .wav format
%         outputFile - enhanced output file in .wav format
%
%   Algorithm uses the first 5 frames to estimate the noise psd, and 
%   then uses a very simple VAD algorithm for updating the noise psd.
%   Alternatively, the VAD in the file ss_rdc.m (this folder) can be used.
%
%  References:
%   [1] Berouti, M., Schwartz, M., and Makhoul, J. (1979). Enhancement of speech 
%       corrupted by acoustic noise. Proc. IEEE Int. Conf. Acoust., Speech, 
%       Signal Processing, 208-211.
%
% Author: Philipos C. Loizou
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0 $  $Date: 10/09/2006 $
%-------------------------------------------------------------------------

if nargin<2
   fprintf('Usage: specsub noisyfile.wav outFile.wav \n\n');
   return;
end

[x,Srate]=audioread(filename);


% =============== Initialize variables ===============
%

len=floor(20*Srate/1000); % Frame size in samples
if rem(len,2)==1, len=len+1; end;
PERC=50; % window overlap in percent of frame size
len1=floor(len*PERC/100);
len2=len-len1; 


Thres=3; % VAD threshold in dB SNRseg 
alpha=2.0; % power exponent
FLOOR=0.002;
G=0.9;

win=hanning(len, 'periodic'); %tukey(len,PERC);  % define window
winGain=len2/sum(win); % normalization gain for overlap+add with 50% overlap

% Noise magnitude calculations - assuming that the first 5 frames is noise/silence
%
nFFT=2*2^nextpow2(len);
noise_mean=zeros(nFFT,1);
j=1;
for k=1:5
   windowed = win.*x(j:j+len-1);
   noise_mean=noise_mean+abs(fft(win.*x(j:j+len-1),nFFT));
   j=j+len;
end
noise_mu=noise_mean/5;

%--- allocate memory and initialize various variables
   
k=1;
img=sqrt(-1);
x_old=zeros(len1,1);
Nframes=floor(length(x)/len2)-1;
xfinal=zeros(Nframes*len2,1);

%===============================  Start Processing ==================================
%
for n=1:Nframes 
   t = x(k:k+len - 1);
   insign=win.*x(k:k+len-1);     %Windowing  
   spec=fft(insign,nFFT);     %compute fourier transform of a frame
   sig=abs(spec); % compute the magnitude
   
   %save the noisy phase information 
   theta=angle(spec);  
   
   SNRseg=10*log10(norm(sig,2)^2/norm(noise_mu,2)^2);
   
   if alpha==1.0
      beta=berouti1(SNRseg);
   else
     beta=berouti(SNRseg);
  end
   
   
   %&&&&&&&&&
   sub_speech=sig.^alpha - beta*noise_mu.^alpha;
   diffw = sub_speech-FLOOR*noise_mu.^alpha;
   
   % Floor negative components
   z=find(diffw <0);  
   if~isempty(z)
      sub_speech(z)=FLOOR*noise_mu(z).^alpha;
   end
   
   
   % --- implement a simple VAD detector --------------
   %
   if (SNRseg < Thres)   % Update noise spectrum
      noise_temp = G*noise_mu.^alpha+(1-G)*sig.^alpha;   
      noise_mu=noise_temp.^(1/alpha);   % new noise spectrum
   end
   
   
   sub_speech(nFFT/2+2:nFFT)=flipud(sub_speech(2:nFFT/2));  % to ensure conjugate symmetry for real reconstruction
  
   x_phase=(sub_speech.^(1/alpha)).*(cos(theta)+img*(sin(theta)));
  
   
   % take the IFFT 
   xi=real(ifft(x_phase));
       
   % --- Overlap and add ---------------
   xfinal(k:k+len2-1)=x_old+xi(1:len1);
   x_old=xi(1+len1:len);
  
  

 k=k+len2;
end
%========================================================================================


% wavwrite(winGain*xfinal,Srate,16,outfile);
audiowrite(outfile, winGain*xfinal,Srate);

%--------------------------------------------------------------------------

function a=berouti1(SNR)

if SNR>=-5.0 & SNR<=20
   a=3-SNR*2/20;
else
   
  if SNR<-5.0
   a=4;
  end

  if SNR>20
    a=1;
  end
  
end

function a=berouti(SNR)

if SNR>=-5.0 & SNR<=20
   a=4-SNR*3/20; 
else
   
  if SNR<-5.0
   a=5;
  end

  if SNR>20
    a=1;
  end
  
end

譜圖（Spectral Graph Theory）理解（2）
2018-09-19
Graph Theory
Python圖片驗證碼降噪 — 8鄰域降噪
2019-02-01
Python
譜範數正則（Spectral Norm Regularization）的理解
2018-10-30
ORM
Spectral Matting的python實現
2018-10-25
Python
程式碼整潔之道之做減法
2021-04-08
做減法
2024-08-30
經過4次優化我把python程式碼耗時減少95%
2021-11-12
優化Python
主動降噪，通話降噪及AI降噪之辨
2021-02-10
AI
如何減小微信小程式程式碼包大小
2021-11-05
微信小程式
JavaScript - 減法運算子
2018-10-21
JavaScript
python程式碼打包exe程式
2021-09-28
Python
數學-錯位相減法
2024-09-08
浮點數加減法
2024-12-04
高精度-高精度減法
2024-06-07
七個不一樣的Python程式碼寫法，讓你寫出一手漂亮的程式碼
2018-09-05
Python
通過 PHP 與 Python 程式碼對比淺析語法差異
2019-07-09
PHPPython
通過PHP與Python程式碼對比淺析語法差異
2019-07-19
PHPPython
減少程式碼中該死的 if else 巢狀
2018-12-17
巢狀
JS 加減乘除尤其是減法精度問題
2024-04-11
JS
Python-程式語法 - Python註釋&基本函式
2024-08-18
Python函式
Optional簡化空值判斷，減少程式碼中的if-else程式碼塊
2020-11-26
Python程式py程式碼加密方法
2024-11-18
Python加密
python常用程式碼整理
2024-03-25
Python
逆向常用python程式碼
2024-03-26
Python
python神祕程式碼
2020-11-15
Python
詳解 Python 的二元算術運算，為什麼說減法只是語法糖？
2020-09-10
Python
Python程式碼混淆工具，Python原始碼保密、加密、混淆
2024-02-05
Python原始碼加密
優雅地減少redux請求樣板程式碼
2019-03-02
Redux
論減少程式碼中return語句的騷操作
2020-10-17
如何減小ABAP業務程式碼的複雜度
2019-05-16
複雜度
Python 萬能程式碼模版：爬蟲程式碼篇
2022-08-25
Python爬蟲
python 安全編碼&程式碼審計
2020-08-19
Python
在python程式中呼叫java程式碼
2022-08-26
PythonJava
Python 系列：如何提高 python 程式程式碼的健壯性
2024-05-21
Python
Python 兩個字串相減
2020-12-15
Python字串
curl轉python requests程式碼
2018-08-16
Python
Python 程式碼混淆工具概述
2024-04-01
Python
Python一些程式碼
2024-08-05
Python

Spectral subtrative 譜減法降噪 python 程式碼

譜減法的基礎實現：

程式碼段

相關文章