用K-means聚類演算法實現音調的分類與視覺化

ggspeed發表於2016-03-12

Galvanize 資料科學課程包括了一系列在科技產業的資料科學家中流行的機器學習課題，但是學生在 Galvanize 獲得的技能並不僅限於那些最流行的科技產業應用。例如，在 Galvanize 的資料科學強化課中，音訊訊號和音樂分析較少被討論，卻它是一個有趣的機器學習概念應用。借用 Galvanize 課程中的課題，本篇教程為大家展示瞭如何利用 K-means 聚類演算法從錄音中分類和視覺化音調，該方法會用到以下幾個 python 工具包： NumPy/SciPy, Scikit-learn 和 Plotly。

K-means 聚類是什麼

k-means 聚類演算法是基於未標識資料集將相關項聚類的常用技術。給定 K 值後，該演算法會將每個資料點劃分到離其最近的中心點對應的簇，從而將整個資料集分成 k 組。k-means 演算法有很廣泛的應用，比如識別手機發射塔的有效位置，或為製造商選擇服裝的型號。而本教程將會為大家展示如何應用 k-means 根據音調來給音訊分類。

音調的簡單入門

一個音符是一串疊加的不同頻率的 Sine 型波，而識別音符的音調需要識別那些聽上去最突出的 Sine 型波的頻率。

最簡單的音符僅包含一個 Sine 型波：

繪製的強度圖譜中，每個組成要素頻率的大小顯示了上面波形的一個單獨的頻率。

主流樂器製造出來的聲音是由很多 sine 型波元素構成的，所以他們比上面展示的純 sine 型波聽起來更復雜。同樣的音符(E3)，由吉他彈奏出來的波形聽看起來如下：

它的強度圖譜顯示了一個更大的基礎頻率的集合：

k-means 可以運用樣例音訊片段的強度圖譜來給音調片段分類。給定一個有 n 個不同頻率的強度圖譜集合，k-means 將會給樣例圖譜分類，從而使在 n 維空間中每個圖譜到它們組中心的歐式距離最小。

使用Numpy/SciPy從一個錄音中建立資料集

本教程將會使用一個有 3 個不同音調的錄音小樣，每個音調是由吉他彈奏了 2 秒。

運用 SciPy 的 wavfile 模組可以輕鬆將一個 .wav 檔案轉化為 NumPy 數值。

import scipy.io.wavfile as wav
filename = 'Guitar - Major Chord - E Gsharp B.wav'
# wav.read returns the sample_rate and a numpy array containing each audio sample from the .wav file
sample_rate, recording = wav.read(filename)

import scipy.io.wavfile as wav

filename = 'Guitar - Major Chord - E Gsharp B.wav'

# wav.read returns the sample_rate and a numpy array containing each audio sample from the .wav file

sample_rate, recording = wav.read(filename)

這段錄音應該被分為多個小段，從而使每段的音調都可以被獨立地分類。

def split_recording(recording, segment_length, sample_rate):
    segments = []
    index = 0
    while index < len(recording):
        segment = recording[index:index + segment_length<em>sample_rate]
        segments.append(segment)
        index += segment_length</em>sample_rate
    return segments

segment_length = .5 # length in seconds
segments = split_recording(recording, segment_length, sample_rate)

def split_recording(recording, segment_length, sample_rate):

segments = []

index = 0

while index < len(recording):

segment = recording[index:index + segment_length<em>sample_rate]

segments.append(segment)

index += segment_length</em>sample_rate

return segments

segment_length = .5 # length in seconds

segments = split_recording(recording, segment_length, sample_rate)

每一段的強度圖譜可以通過傅立葉變換獲得；傅立葉變換會將波形資料從時間域轉換到頻率域。以下的程式碼展示瞭如何使用 NumPy 實現傅立葉變換(Fourie transform)模組。

def calculate_normalized_power_spectrum(recording, sample_rate):
    # np.fft.fft returns the discrete fourier transform of the recording
    fft = np.fft.fft(recording) 
    number_of_samples = len(recording)
    # sample_length is the length of each sample in seconds
    sample_length = 1./sample_rate 
    # fftfreq is a convenience function which returns the list of frequencies measured by the fft
    frequencies = np.fft.fftfreq(number_of_samples, sample_length)
    positive_frequency_indices = np.where(frequencies>0) 
    # positive frequences returned by the fft
    frequencies = frequencies[positive_frequency_indices]
    # magnitudes of each positive frequency in the recording
    magnitudes = abs(fft[positive_frequency_indices]) 
    # some segments are louder than others, so normalize each segment
    magnitudes = magnitudes / np.linalg.norm(magnitudes)
    return frequencies, magnitudes

def calculate_normalized_power_spectrum(recording, sample_rate):

# np.fft.fft returns the discrete fourier transform of the recording

fft = np.fft.fft(recording)

number_of_samples = len(recording)

# sample_length is the length of each sample in seconds

sample_length = 1./sample_rate

# fftfreq is a convenience function which returns the list of frequencies measured by the fft

frequencies = np.fft.fftfreq(number_of_samples, sample_length)

positive_frequency_indices = np.where(frequencies>0)

# positive frequences returned by the fft

frequencies = frequencies[positive_frequency_indices]

# magnitudes of each positive frequency in the recording

magnitudes = abs(fft[positive_frequency_indices])

# some segments are louder than others, so normalize each segment

magnitudes = magnitudes / np.linalg.norm(magnitudes)

return frequencies, magnitudes

一些輔助函式會建立一個空的 NumPy 數值並將我們的樣例強度圖譜放入其中。

def create_power_spectra_array(segment_length, sample_rate):
    number_of_samples_per_segment = int(segment_length * sample_rate)
    time_per_sample = 1./sample_rate
    frequencies = np.fft.fftfreq(number_of_samples_per_segment, time_per_sample)
    positive_frequencies = frequencies[frequencies>0]
    power_spectra_array = np.empty((0, len(positive_frequencies)))
    return power_spectra_array

def fill_power_spectra_array(splits, power_spectra_array, fs):
    filled_array = power_spectra_array
    for segment in splits:
        freqs, mags = calculate_normalized_power_spectrum(segment, fs)
        filled_array = np.vstack((filled_array, mags))
    return filled_array

power_spectra_array = create_power_spectra_array(segment_length,sample_rate)
power_spectra_array = fill_power_spectra_array(segments, power_spectra_array, sample_rate)

def create_power_spectra_array(segment_length, sample_rate):

number_of_samples_per_segment = int(segment_length * sample_rate)

time_per_sample = 1./sample_rate

frequencies = np.fft.fftfreq(number_of_samples_per_segment, time_per_sample)

positive_frequencies = frequencies[frequencies>0]

power_spectra_array = np.empty((0, len(positive_frequencies)))

return power_spectra_array

def fill_power_spectra_array(splits, power_spectra_array, fs):

filled_array = power_spectra_array

for segment in splits:

freqs, mags = calculate_normalized_power_spectrum(segment, fs)

filled_array = np.vstack((filled_array, mags))

return filled_array

power_spectra_array = create_power_spectra_array(segment_length,sample_rate)

power_spectra_array = fill_power_spectra_array(segments, power_spectra_array, sample_rate)

“power_spectra_array “是我們的訓練資料集，它包含了一個強度圖譜，在此圖譜中錄音按每 0.5 秒的間隔進行了分段。

利用 Scikit-learn 來執行 k-means

Scikit-learn 有一個易用的 k-means 實現。我們的音訊樣例包括 3 個不同的音調，所以將 k 設定為 3。

from sklearn.cluster import KMeans
kmeans = KMeans(3, max<em>iter = 1000, n_init = 100)
kmeans.fit_transform(power_spectra_array)
predictions = kmeans.predict(power_spectra_array)

from sklearn.cluster import KMeans

kmeans = KMeans(3, max<em>iter = 1000, n_init = 100)

kmeans.fit_transform(power_spectra_array)

predictions = kmeans.predict(power_spectra_array)

“predictions”是一個 Python 資料，它包含了 12 個音訊分段的分組標籤(一個任意的整數)。

print predictions
=> [2 2 2 2 0 0 0 0 1 1 1 1]

1 2	print predictions => [2 2 2 2 0 0 0 0 1 1 1 1]

這個陣列說明了在聽這段音訊時連續音訊分段被正確地分在了一起。

使用 Plotly 視覺化結果

為了更好的理解預測結果，需要繪製每個樣例的強度圖譜，每個樣例均用顏色來標記出其對應的 k-means 分組結果。

# find x-values for plot (frequencies)
number<em>of_samples = int(segment_length*sample_rate)
sample_length = 1./sample_rate 
frequencies = np.fft.fftfreq(number_of_samples, sample_length)

# create plot
traces = []
for pitch_id, color in enumerate(['red','blue','green']):
    for power_spectrum in power_spectra_array[predictions == pitch_id]:
        trace = Scatter(x=frequencies[0:500],
                        y=power_spectrum[0:500],
                        mode='lines',
                        showlegend=False,
                        line=Line(shape='linear',
                                  color=color,
                                  opacity = .01,
                                  width = 1))
        traces.append(trace)
layout = Layout(xaxis=XAxis(title='Frequency (Hz)'),
                yaxis=YAxis(title = 'Amplitude (normalized)'),
                title = 'Power Spectra of Sample Audio Segments')
data_to_plot = Data(traces)
fig = Figure(data=data_to_plot, layout=layout)
# py.iplot plots inline using IPython Notebook
py.iplot(fig, filename = 'K-Means Classification of Power Spectrum')

# find x-values for plot (frequencies)

number<em>of_samples = int(segment_length*sample_rate)

sample_length = 1./sample_rate

frequencies = np.fft.fftfreq(number_of_samples, sample_length)

# create plot

traces = []

for pitch_id, color in enumerate(['red','blue','green']):

for power_spectrum in power_spectra_array[predictions == pitch_id]:

trace = Scatter(x=frequencies[0:500],

y=power_spectrum[0:500],

mode='lines',

showlegend=False,

line=Line(shape='linear',

color=color,

opacity = .01,

width = 1))

traces.append(trace)

layout = Layout(xaxis=XAxis(title='Frequency (Hz)'),

yaxis=YAxis(title = 'Amplitude (normalized)'),

title = 'Power Spectra of Sample Audio Segments')

data_to_plot = Data(traces)

fig = Figure(data=data_to_plot, layout=layout)

# py.iplot plots inline using IPython Notebook

py.iplot(fig, filename = 'K-Means Classification of Power Spectrum')

下面的圖中每個有色的細線代表了樣例 .wav 檔案中 12 個音訊分段的強度圖譜。不同顏色的線表示了 k-means 預測出來的分段音調。其中藍色，綠色，紅色圖譜的高峰分別在 82.41 Hz (E), 103.83 Hz (G#), and 123.47 Hz (B)，這些是音訊小樣的音符。音訊小樣中頻率最強的是低頻，所以只有由 FFT (快速傅立葉變換)測量出的最低的 500 個頻率被包含進了以下圖表。

繪製在 3 個取樣音調中共有的 2 個最強泛音的振幅，這種自然的聚類過程便十分明顯了。

Learn More at Galvanize!

k-means 是 Galvanize 資料科學強化專案中眾多機器學習課題的一個。如果感興趣，可以在這裡學到更多。

聚類演算法與K-means實現
2021-09-08
聚類演算法
【Python機器學習實戰】聚類演算法（1）——K-Means聚類
2021-12-06
Python機器學習聚類演算法
K-means聚類演算法
2017-03-23
聚類演算法
k-means 聚類演算法
2017-06-19
聚類演算法
機器學習之k-means聚類演算法(python實現)
2018-03-01
機器學習聚類演算法Python
「影像分類」實戰影像分類網路的視覺化
2019-09-04
視覺化
04聚類演算法-程式碼案例一-K-means聚類
2018-12-08
聚類演算法
K-Means聚類演算法原理
2016-12-12
聚類演算法
k-means聚類
2023-01-30
聚類
第十三篇：K-Means 聚類演算法原理分析與程式碼實現
2017-01-19
聚類演算法
k-medoids與k-Means聚類演算法的異同
2020-04-07
聚類演算法
從零開始學機器學習——聚類視覺化
2024-11-18
機器學習聚類視覺化
【機器學習】K-means聚類分析
2022-06-30
機器學習聚類
分類和聚類
2011-01-24
聚類
MVO優化DBSCAN實現聚類
2020-11-02
優化聚類
資料分析與挖掘 - R語言：K-means聚類演算法
2016-05-02
R語言聚類演算法
用Python實現文件聚類
2016-06-28
Python聚類
演算法雜貨鋪：k均值聚類(K-means)
2015-04-28
演算法聚類
視覺化影像處理 | 視覺化訓練器 | 影像分類
2024-07-02
視覺化
Myers差分演算法的理解、實現、視覺化
2022-06-08
演算法視覺化
機器學習—聚類5-1（K-Means演算法+瑞士捲）
2022-03-15
機器學習聚類演算法
Mahout聚類演算法學習之Canopy演算法的分析與實現
2015-10-09
聚類演算法
kmeans聚類演算法matlab實現
2014-12-08
聚類演算法Matlab
MMM全連結聚類演算法實現
2024-05-25
聚類演算法
K-Means聚類分析以及誤差平方和SSE（Python實現）
2024-11-14
聚類Python
譜聚類的python實現
2020-08-23
聚類Python
聚類演算法
2020-04-26
聚類演算法
聚類之K均值聚類和EM演算法
2019-05-13
聚類演算法
聚類(part3)--高階聚類演算法
2020-10-11
聚類演算法
kmeans實現文字聚類
2017-06-22
聚類
從零開始學機器學習——K-Means 聚類
2024-11-20
機器學習聚類
為什麼說K-Means是基於距離的聚類演算法？
2018-03-12
聚類演算法
迴歸、分類與聚類：三大方向剖解機器學習演算法的優缺點
2017-05-20
聚類機器學習演算法
演算法金 | 一文讀懂K均值（K-Means）聚類演算法
2024-06-05
演算法聚類
20分鐘學會DBSCAN聚類演算法
2024-07-16
聚類演算法
CNN視覺化技術總結（三）--類視覺化
2021-02-14
CNN視覺化
決策邊界視覺化，讓你的分類合理有序
2019-01-22
視覺化
全面瞭解R語言中的k-means如何聚類？
2017-11-09
R語言聚類