Spectral subtrative 譜減法降噪 python 程式碼
最近在看speech enhancement 內容,看完譜減法部分後,在網上找相應的程式碼來看,然後將MATLAB程式碼轉成Python程式碼,順便學習一下Python的使用。
譜減法的基礎實現:
論文《Enhancement of speech corrupted by acoustic noise》提出的實現:
演算法流程如下:
效果如下:
這是一段火車站附近的錄音,噪聲比較平穩;設定的VAD閾值是3dB,一般應用上設定的是6dB
雖然floor值的存在填充了頻譜中一定的 “波谷”, 但可以看到還是有一些音樂噪聲的存在。
程式碼段
Python程式碼如下:
-
#!/usr/bin/env python # encoding: utf-8 ''' @author: Kolor @license: @contact: colorsu1922@163.com @software: pycharm @file: spec_sub.py @time: 2020/12/21 23:44 @desc: Implements the basic power spectral subtraction algorithm [1]. Usage: specsub(noisyFile, outputFile) noisyFile - noisy speech file in .wav format outputFile - enhanced output file in .wav format Algorithm uses the first 5 frames to estimate the noise psd, and then uses a very simple VAD algorithm for updating the noise psd. Alternatively, the VAD in the file ss_rdc.m (this folder) can be used. References: [1] Berouti, M., Schwartz, M., and Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 208-211. Author: Philipos C. Loizou Copyright (c) 2006 by Philipos C. Loizou $Revision: 0.0 $ $Date: 10/09/2006 $ ------------------------------------------------------------------------- ''' from typing import Type import librosa import numpy as np import sys import matplotlib.pyplot as plt import soundfile as sf def rh_spec_sub(infile, outfile): print(np.__version__) x, samp_rate = librosa.load(infile, sr=8000, mono=True) # sys.exit("pause") # =============== Initialize variables =============== # Frame size in samples frame_len = int(np.floor(20 * samp_rate / 1000)) if np.remainder(frame_len, 2) == 1: frame_len = frame_len + 1 # window overlap in percent of frame size overlap_percent = 50 len1 = int(np.floor(frame_len * overlap_percent / 100)) len2 = int(frame_len - len1) # VAD threshold in dB SNR vad_thresh = 3 # power exponent alpha = 2.0 FLOOR = 0.002 G = 0.9 # define window win = np.hanning(frame_len+1)[0:frame_len] win_gain = len2 / np.sum(win) # normalization gain for overlap+add with 50% overlap # Noise magnitude calculations - assuming that the first 5 frames is noise/silence NFFT = int(2 * 2 ** next_pow2(frame_len)) noise_mean = np.zeros(NFFT) j = 0 for k in range(0, 5): windowed = np.multiply(win, x[j:j+frame_len]) noise_mean = np.add(noise_mean, np.abs(np.fft.fft(windowed, NFFT))) j = j + frame_len noise_mu = noise_mean / 5 # --- allocate memory and initialize various variables k = 0 img = 1j # np.sqrt(-1) x_old = np.zeros(len1) Nframs = int(np.floor(len(x)/len2) - 1) xfinal = np.zeros(Nframs * len2) # =============================== Start Processing ================================== for n in range(0, Nframs): insign = np.multiply(win, x[k:k+frame_len]) spec = np.fft.fft(insign, NFFT) sig = np.abs(spec) # compute the magnitude # save the noisy phase information theta = np.angle(spec) SNR_seg = 10*np.log10(np.linalg.norm(sig, 2)**2 / np.linalg.norm(noise_mu, 2)**2) if alpha == 1.0: beta = berouti1(SNR_seg) else: beta = berouti(SNR_seg) # &&&&&&&&&&&&&&& sub_speech = sig ** alpha - beta * noise_mu ** alpha diffw = sub_speech - FLOOR * noise_mu ** alpha # floor negative component z = np.argwhere(diffw < 0) if len(z) is not 0: sub_speech[z] = FLOOR * noise_mu[z] ** alpha # --- implement a simple VAD detector - ------------- if SNR_seg < vad_thresh: # Update noise spectrum noise_temp = G * noise_mu ** alpha + (1 - G) * sig ** alpha noise_mu = noise_temp ** (1 / alpha) # new noise spectrum # to ensure conjugate symmetry for real reconstruction sub_speech[int(NFFT/2) + 1: NFFT] = np.flipud(sub_speech[1: int(NFFT/2)]) x_phase = (sub_speech**(1/alpha)) * (np.cos(theta) + img * (np.sin(theta))) # take the iFFT xi = np.real(np.fft.ifft(x_phase)) # plt.plot(xi) # plt.show() # --- Overlap and add --------------- xfinal[k:k+len2] = x_old + xi[0:len1] x_old = xi[len1:frame_len] k = k + len2 # write output sf.write(outfile, win_gain * xfinal, samp_rate, 'PCM_16') def berouti1(snr): beta = 0 if -5.0 <= snr <= 20: beta = 3 - snr * 2 / 20 elif snr < -5.0: beta = 4 elif snr > 20: beta = 1 return beta def berouti(snr): beta = 0 if -5.0 <= snr <= 20: beta = 4 - snr * 3 / 20 elif snr < -5.0: beta = 5 elif snr > 20: beta = 1 return beta def next_pow2(n): return np.ceil(np.log2(np.abs(n))) def load_audio(filename, trace=0): """ load wav file using audioread. This is not available in python x,y. """ data = np.array([]) with audioread.audio_open(filename) as af: trace_n = af.channels if trace >= trace_n: print('number of traces in file is', trace_n) quit() nsamp = np.ceil(af.samplerate * af.duration) print(f"nsamp =%d" % nsamp) data = np.ascontiguousarray(np.zeros(nsamp, 1)) index = 0 for buffer in af: full_data = np.fromstring(buffer).reshape(-1, af.channels) n = full_data.shape[0] if index + n > len(data): n = len(data) - index if n > 0: data[index:index + n] = full_data[:n, trace] index += n else: break return af.samplerate, data / 2.0 ** 15, 'a.u.'
上面程式碼是根據speech enhancement中的MATLAB程式碼改寫的,僅用作學習用途。
原始的MATLAB程式碼如下:
function specsub(filename,outfile)
% Implements the basic power spectral subtraction algorithm [1].
%
% Usage: specsub(noisyFile, outputFile)
%
% noisyFile - noisy speech file in .wav format
% outputFile - enhanced output file in .wav format
%
% Algorithm uses the first 5 frames to estimate the noise psd, and
% then uses a very simple VAD algorithm for updating the noise psd.
% Alternatively, the VAD in the file ss_rdc.m (this folder) can be used.
%
% References:
% [1] Berouti, M., Schwartz, M., and Makhoul, J. (1979). Enhancement of speech
% corrupted by acoustic noise. Proc. IEEE Int. Conf. Acoust., Speech,
% Signal Processing, 208-211.
%
% Author: Philipos C. Loizou
%
% Copyright (c) 2006 by Philipos C. Loizou
% $Revision: 0.0 $ $Date: 10/09/2006 $
%-------------------------------------------------------------------------
if nargin<2
fprintf('Usage: specsub noisyfile.wav outFile.wav \n\n');
return;
end
[x,Srate]=audioread(filename);
% =============== Initialize variables ===============
%
len=floor(20*Srate/1000); % Frame size in samples
if rem(len,2)==1, len=len+1; end;
PERC=50; % window overlap in percent of frame size
len1=floor(len*PERC/100);
len2=len-len1;
Thres=3; % VAD threshold in dB SNRseg
alpha=2.0; % power exponent
FLOOR=0.002;
G=0.9;
win=hanning(len, 'periodic'); %tukey(len,PERC); % define window
winGain=len2/sum(win); % normalization gain for overlap+add with 50% overlap
% Noise magnitude calculations - assuming that the first 5 frames is noise/silence
%
nFFT=2*2^nextpow2(len);
noise_mean=zeros(nFFT,1);
j=1;
for k=1:5
windowed = win.*x(j:j+len-1);
noise_mean=noise_mean+abs(fft(win.*x(j:j+len-1),nFFT));
j=j+len;
end
noise_mu=noise_mean/5;
%--- allocate memory and initialize various variables
k=1;
img=sqrt(-1);
x_old=zeros(len1,1);
Nframes=floor(length(x)/len2)-1;
xfinal=zeros(Nframes*len2,1);
%=============================== Start Processing ==================================
%
for n=1:Nframes
t = x(k:k+len - 1);
insign=win.*x(k:k+len-1); %Windowing
spec=fft(insign,nFFT); %compute fourier transform of a frame
sig=abs(spec); % compute the magnitude
%save the noisy phase information
theta=angle(spec);
SNRseg=10*log10(norm(sig,2)^2/norm(noise_mu,2)^2);
if alpha==1.0
beta=berouti1(SNRseg);
else
beta=berouti(SNRseg);
end
%&&&&&&&&&
sub_speech=sig.^alpha - beta*noise_mu.^alpha;
diffw = sub_speech-FLOOR*noise_mu.^alpha;
% Floor negative components
z=find(diffw <0);
if~isempty(z)
sub_speech(z)=FLOOR*noise_mu(z).^alpha;
end
% --- implement a simple VAD detector --------------
%
if (SNRseg < Thres) % Update noise spectrum
noise_temp = G*noise_mu.^alpha+(1-G)*sig.^alpha;
noise_mu=noise_temp.^(1/alpha); % new noise spectrum
end
sub_speech(nFFT/2+2:nFFT)=flipud(sub_speech(2:nFFT/2)); % to ensure conjugate symmetry for real reconstruction
x_phase=(sub_speech.^(1/alpha)).*(cos(theta)+img*(sin(theta)));
% take the IFFT
xi=real(ifft(x_phase));
% --- Overlap and add ---------------
xfinal(k:k+len2-1)=x_old+xi(1:len1);
x_old=xi(1+len1:len);
k=k+len2;
end
%========================================================================================
% wavwrite(winGain*xfinal,Srate,16,outfile);
audiowrite(outfile, winGain*xfinal,Srate);
%--------------------------------------------------------------------------
function a=berouti1(SNR)
if SNR>=-5.0 & SNR<=20
a=3-SNR*2/20;
else
if SNR<-5.0
a=4;
end
if SNR>20
a=1;
end
end
function a=berouti(SNR)
if SNR>=-5.0 & SNR<=20
a=4-SNR*3/20;
else
if SNR<-5.0
a=5;
end
if SNR>20
a=1;
end
end
相關文章
- 譜圖(Spectral Graph Theory)理解(2)Graph Theory
- 譜聚類(spectral clustering)原理總結聚類
- 譜範數正則(Spectral Norm Regularization)的理解ORM
- Python圖片驗證碼降噪 — 8鄰域降噪Python
- 程式碼整潔之道之做減法
- Spectral Matting的python實現Python
- python語法-測試程式碼Python
- 做減法
- 從一個加減法運算程式碼理解特殊運算子的過載
- 主動降噪,通話降噪及AI降噪之辨AI
- 如何減小微信小程式程式碼包大小微信小程式
- 使用MVVM減少控制器程式碼實戰(減少56%)MVVM
- JavaScript - 減法運算子JavaScript
- 經過4次優化我把python程式碼耗時減少95%優化Python
- ACM 分數加減法ACM
- 高精度-高精度減法
- 數學-錯位相減法
- JS 加減乘除 尤其是減法精度問題JS
- 編寫良好的程式碼:如何減少程式碼的認知負荷
- MSSQL中的日期減價法SQL
- SQL中時間的加減法SQL
- [轉載]Python兵器譜Python
- 如何減小ABAP業務程式碼的複雜度複雜度
- 減少程式碼中該死的 if else 巢狀巢狀
- 縮減程式碼和資源(Shrink Your Code and Resources)
- Optional簡化空值判斷,減少程式碼中的if-else程式碼塊
- 『TensorFlow』讀書筆記_降噪自編碼器筆記
- 詳解 Python 的二元算術運算,為什麼說減法只是語法糖?Python
- 優雅地減少redux請求樣板程式碼Redux
- 論減少程式碼中return語句的騷操作
- 減少C++程式碼編譯時間的方法C++編譯
- golang time 時間的加減法Golang
- JSF的加減法與SeamJS
- 十六進位制減法計算
- Python 兩個字串相減Python字串
- python程式碼打包exe程式Python
- 七個不一樣的Python程式碼寫法,讓你寫出一手漂亮的程式碼Python
- 通過PHP與Python程式碼對比淺析語法差異PHPPython