所謂"扒譜"是指透過聽歌或觀看演奏影片等方式,逐步分析和還原音樂作品的曲譜或樂譜的過程。它是音樂學習和演奏的一種常見方法,通常由音樂愛好者、樂手或學生使用。
在扒譜的過程中,人們會仔細聆聽音樂作品,辨別和記錄出各個音符、和絃、節奏等元素,並透過試錯和反覆推敲來逐漸還原出準確的曲譜或樂譜。這對於那些沒有正式樂譜或想學習特定曲目的人來說,是一種有效的方式。
扒譜的目的是為了更好地理解和演奏音樂作品,從中學習技巧、樂曲結構和藝術表達等方面。但不懂樂理的人很難聽出音符和音準,本次我們透過openvpi的開源專案some來直接針對mp3檔案進行扒譜,將mp3轉換為midi檔案。
專案配置
首先我們來克隆專案:
git clone https://github.com/openvpi/SOME.git
進入專案的目錄some:
cd some
接著下載專案的預訓練模型:
https://pan.baidu.com/s/1lVQcKP7ijTELslJNgoDqkQ?pwd=odsm
2stems模型放到專案的pretrained_models目錄下。
ckpt模型放入專案的ckpt目錄下。
如果沒有ckpt和pretrained_models目錄,請手動建立。
如下所示:
├───ckpt
│ config.yaml
│ model_ckpt_steps_104000_simplified.ckpt
├───pretrained_models
│ └───2stems
│ ._checkpoint
│ checkpoint
│ model.data-00000-of-00001
│ model.index
│ model.meta
如此,專案就配置好了。
背景音樂和人聲分離
扒譜主要針對人聲部分,所以需要spleeter的參與,關於spleeter,請參見:人工智慧AI庫Spleeter免費人聲和背景音樂分離實踐(Python3.10),囿於篇幅,這裡不再贅述。
執行命令:
spleeter separate -p spleeter:2stems -o ./output ./test.mp3
這裡使用2stems模型已經在上文中進行下載,並且放置在專案的pretrained_models目錄。
如果沒有output目錄,請手動建立,test.mp3為需要扒譜的音樂檔案。
隨後會將背景音樂accompaniment.wav和人聲vocals.wav分別輸出在專案的output目錄:
├───output
│ └───test
│ accompaniment.wav
│ vocals.wav
人聲去噪
一般情況下,分離後的人聲可能還存在混音等噪音,會影響轉換的效果。
這裡使用noisereduce來進行降噪:
pip install noisereduce
編寫降噪程式碼:
from scipy.io import wavfile
import noisereduce as nr
# load data
rate, data = wavfile.read("./output/test/vocals.wav")
# perform noise reduction
reduced_noise = nr.reduce_noise(y=data, sr=rate)
wavfile.write("./output/test/vocals.wav", rate, reduced_noise)
執行後會對vocals.wav人聲檔案進行降噪重寫操作。
扒譜(wav轉換midi)
接著執行命令進行轉換:
python infer.py --model ./ckpt/model_ckpt_steps_104000_simplified.ckpt --wav ./output/test/vocals.wav
程式返回:
python infer.py --model ./ckpt/model_ckpt_steps_104000_simplified.ckpt --wav ./output/test/vocals.wav
accumulate_grad_batches: 1, audio_sample_rate: 44100, binarization_args: {'num_workers': 0, 'shuffle': True}, binarizer_cls: preprocessing.MIDIExtractionBinarizer, binary_data_dir: data/some_ds_fixmel_spk3_aug8/binary,
clip_grad_norm: 1, dataloader_prefetch_factor: 2, ddp_backend: nccl, ds_workers: 4, finetune_ckpt_path: None,
finetune_enabled: False, finetune_ignored_params: [], finetune_strict_shapes: True, fmax: 8000, fmin: 40,
freezing_enabled: False, frozen_params: [], hop_size: 512, log_interval: 100, lr_scheduler_args: {'min_lr': 1e-05, 'scheduler_cls': 'lr_scheduler.scheduler.WarmupLR', 'warmup_steps': 5000},
max_batch_frames: 80000, max_batch_size: 8, max_updates: 10000000, max_val_batch_frames: 10000, max_val_batch_size: 1,
midi_extractor_args: {'attention_drop': 0.1, 'attention_heads': 8, 'attention_heads_dim': 64, 'conv_drop': 0.1, 'dim': 512, 'ffn_latent_drop': 0.1, 'ffn_out_drop': 0.1, 'kernel_size': 31, 'lay': 8, 'use_lay_skip': True}, midi_max: 127, midi_min: 0, midi_num_bins: 128, midi_prob_deviation: 1.0,
midi_shift_proportion: 0.0, midi_shift_range: [-6, 6], model_cls: modules.model.Gmidi_conform.midi_conforms, num_ckpt_keep: 5, num_sanity_val_steps: 1,
num_valid_plots: 300, optimizer_args: {'beta1': 0.9, 'beta2': 0.98, 'lr': 0.0001, 'optimizer_cls': 'torch.optim.AdamW', 'weight_decay': 0}, pe: rmvpe, pe_ckpt: pretrained/rmvpe/model.pt, permanent_ckpt_interval: 40000,
permanent_ckpt_start: 200000, pl_trainer_accelerator: auto, pl_trainer_devices: auto, pl_trainer_num_nodes: 1, pl_trainer_precision: 32-true,
pl_trainer_strategy: auto, raw_data_dir: [], rest_threshold: 0.1, sampler_frame_count_grid: 6, seed: 114514,
sort_by_len: True, task_cls: training.MIDIExtractionTask, test_prefixes: None, train_set_name: train, units_dim: 80,
units_encoder: mel, units_encoder_ckpt: pretrained/contentvec/checkpoint_best_legacy_500.pt, use_buond_loss: True, use_midi_loss: True, val_check_interval: 4000,
valid_set_name: valid, win_size: 2048
| load 'model' from 'ckpt\model_ckpt_steps_104000_simplified.ckpt'.
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.66it/s]
MIDI file saved at: 'output\test\vocals.mid'
轉換好的鋼琴旋律midi檔案存放在output目錄下,直接雙擊播放即可,也可以透過程式碼進行播放:
''' pg_midi_sound101.py
play midi music files (also mp3 files) using pygame
tested with Python273/331 and pygame192 by vegaseat
'''
import pygame as pg
def play_music(music_file):
'''
stream music with mixer.music module in blocking manner
this will stream the sound from disk while playing
'''
clock = pg.time.Clock()
try:
pg.mixer.music.load(music_file)
print("Music file {} loaded!".format(music_file))
except pygame.error:
print("File {} not found! {}".format(music_file, pg.get_error()))
return
pg.mixer.music.play()
# check if playback has finished
while pg.mixer.music.get_busy():
clock.tick(30)
# pick a midi or MP3 music file you have in the working folder
# or give full pathname
music_file = r"D:\work\YiJianBaPu\output\test\vocals.mid"
#music_file = "Drumtrack.mp3"
freq = 44100 # audio CD quality
bitsize = -16 # unsigned 16 bit
channels = 2 # 1 is mono, 2 is stereo
buffer = 2048 # number of samples (experiment to get right sound)
pg.mixer.init(freq, bitsize, channels, buffer)
# optional volume 0 to 1.0
pg.mixer.music.set_volume(0.8)
try:
play_music(music_file)
except KeyboardInterrupt:
# if user hits Ctrl/C then exit
# (works only in console mode)
pg.mixer.music.fadeout(1000)
pg.mixer.music.stop()
raise SystemExit
結語
筆者在原專案的基礎上進行了fork,新增了人聲分離和降噪的功能,並且整合了預訓練模型,與眾鄉親同饗:
https://github.com/v3ucn/YiJianBaPu