近日,Bert-vits2-v2.2如約更新,該新版本v2.2主要把Emotion 模型換用CLAP多模態模型,推理支援輸入text prompt提示詞和audio prompt提示語音來進行引導風格化合成,讓推理音色更具情感特色,並且推出了新的預處理webuI,操作上更加親民和接地氣。
更多情報請參見Bert-vits2官網:
https://github.com/fishaudio/Bert-VITS2/releases/tag/v2.2
與此同時,基於FastApi的推理web介面專案也同步適配了Bert-vits2-v2.2版本,官網如下:
https://github.com/jiangyuxiaoxiao/Bert-VITS2-UI
本次我們基於此兩個專案來克隆原神角色八重神子的英文語音模型miko。
Bert-vits2-v2.2新的底模和情感模型
首先克隆Bert-vits2-v2.2官方專案:
git clone https://github.com/fishaudio/Bert-VITS2/tree/v2.2
安裝依賴:
pip3 install -r requirements.txt
這裡注意是v2.2的tag分支,因為官方隨時都在更新,主分支可能會存在bug。
進入專案的目錄:
cd /Bert-VITS2
隨後下載新的底模和情感模型,下載地址:
https://openi.pcl.ac.cn/Stardust_minus/Bert-VITS2/modelmanage/show_model
將新的情感模型clap-hatsat-fused放入到專案的emotional目錄,結構如下:
E:\work\Bert-VITS2-v22\emotional>tree /f
Folder PATH listing for volume myssd
Volume serial number is 7CE3-15AE
E:.
├───clap-htsat-fused
│ .gitattributes
│ config.json
│ merges.txt
│ preprocessor_config.json
│ pytorch_model.bin
│ README.md
│ special_tokens_map.json
│ tokenizer.json
│ tokenizer_config.json
│ vocab.json
│
└───wav2vec2-large-robust-12-ft-emotion-msp-dim
.gitattributes
config.json
LICENSE
preprocessor_config.json
pytorch_model.bin
README.md
vocab.json
注意,wav2vec2-large-robust-12-ft-emotion-msp-dim是Bert-vits2-v2.1的情感模型,也需要保留,具體請移步:義無反顧馬督工,Bert-vits2V210復刻馬督工實踐(Python3.10), 這裡不再贅述。
至此,新模型就配置好了。
Bert-vits2-v2.2模型訓練
首先下載訓練集,以原神角色八重神子的英文配音為例子,資料集下載地址:
https://github.com/AI-Hobbyist/Genshin_Datasets
隨後新建miko角色目錄
mkdir miko
將語音標註檔案以esd.list命名,放入miko目錄。
同時將分片語音素材放入raw目錄。
最後新建miko/configs/config.json配置檔案:
{
"train": {
"log_interval": 50,
"eval_interval": 50,
"seed": 42,
"epochs": 1000,
"learning_rate": 0.0002,
"betas": [
0.8,
0.99
],
"eps": 1e-09,
"batch_size": 6,
"fp16_run": false,
"lr_decay": 0.99995,
"segment_size": 16384,
"init_lr_ratio": 1,
"warmup_epochs": 0,
"c_mel": 45,
"c_kl": 1.0,
"skip_optimizer": false,
"freeze_ZH_bert": false,
"freeze_JP_bert": false,
"freeze_EN_bert": false
},
"data": {
"training_files": "data/miko/train.list",
"validation_files": "data/miko/val.list",
"max_wav_value": 32768.0,
"sampling_rate": 44100,
"filter_length": 2048,
"hop_length": 512,
"win_length": 2048,
"n_mel_channels": 128,
"mel_fmin": 0.0,
"mel_fmax": null,
"add_blank": true,
"n_speakers": 1,
"cleaned_text": true,
"spk2id": {
"miko": 0
}
},
"model": {
"use_spk_conditioned_encoder": true,
"use_noise_scaled_mas": true,
"use_mel_posterior_encoder": false,
"use_duration_discriminator": true,
"inter_channels": 192,
"hidden_channels": 192,
"filter_channels": 768,
"n_heads": 2,
"n_layers": 6,
"kernel_size": 3,
"p_dropout": 0.1,
"resblock": "1",
"resblock_kernel_sizes": [
3,
7,
11
],
"resblock_dilation_sizes": [
[
1,
3,
5
],
[
1,
3,
5
],
[
1,
3,
5
]
],
"upsample_rates": [
8,
8,
2,
2,
2
],
"upsample_initial_channel": 512,
"upsample_kernel_sizes": [
16,
16,
8,
2,
2
],
"n_layers_q": 3,
"use_spectral_norm": false,
"gin_channels": 256
},
"version": "2.2"
}
這裡注意"version": "2.2",即版本號為最新的v2.2。
其他引數根據當前的裝置環境酌情調整即可。
隨後啟動預處理頁面:
python3 webui_preprocess.py
訪問http://127.0.0.1:7860/:
按照頁面的步驟進行操作即可,簡單且方便。
操作完之後,執行訓練命令:
python3 train_ms.py
訓練好的模型放在data/miko/models目錄,結構如下:
E:\work\Bert-VITS2-v22\Data\miko\models>tree /f
Folder PATH listing for volume myssd
Volume serial number is 7CE3-15AE
E:.
│ DUR_0.pth
│ DUR_100.pth
│ DUR_150.pth
│ DUR_50.pth
│ D_0.pth
│ D_100.pth
│ D_150.pth
│ D_50.pth
│ events.out.tfevents.1702457087.ly.13044.0
│ events.out.tfevents.1702458207.ly.12416.0
│ githash
│ G_0.pth
│ G_100.pth
│ G_150.pth
│ G_50.pth
│ train.log
│
└───eval
events.out.tfevents.1702457087.ly.13044.1
events.out.tfevents.1702458207.ly.12416.1
至此,訓練環節結束。
Bert-vits2-v2.2模型推理
推理我們使用Bert-vits2-UI專案的頁面,克隆web專案:
git clone https://github.com/jiangyuxiaoxiao/Bert-VITS2-UI
將Web專案放入Bert-vits2-v2.2的根目錄中,目錄結構如下:
E:\work\Bert-VITS2-v22_lilith\Web>tree /f
Folder PATH listing for volume myssd
Volume serial number is 7CE3-15AE
E:.
│ index.html
│
├───assets
│ index-21bc6a28.css
│ index-402c0217.js
│
└───img
helps1.png
helps2.png
Hiyori.ico
這裡包含主頁面、樣式檔案以及JS檔案,基於Hiyori。
隨後啟動推理頁面:
python3 server_fastapi.py
載入模型進行推理即可。
此外,還可以基於FastAPI的介面進行推理,換句話說,傳送http請求即可獲取推理音訊,介面引數如下:
{
"openapi": "3.1.0",
"info": {
"title": "FastAPI",
"version": "0.1.0"
},
"paths": {
"/": {
"get": {
"summary": "Index",
"operationId": "index__get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
}
}
}
},
"/voice": {
"post": {
"summary": "Voice",
"description": "語音介面,若需要上傳參考音訊請僅使用post請求",
"operationId": "voice_voice_post",
"parameters": [
{
"name": "model_id",
"in": "query",
"required": true,
"schema": {
"type": "integer",
"description": "模型ID",
"title": "Model Id"
},
"description": "模型ID"
},
{
"name": "speaker_name",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "說話人名",
"title": "Speaker Name"
},
"description": "說話人名"
},
{
"name": "speaker_id",
"in": "query",
"required": false,
"schema": {
"type": "integer",
"description": "說話人id,與speaker_name二選一",
"title": "Speaker Id"
},
"description": "說話人id,與speaker_name二選一"
},
{
"name": "sdp_ratio",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "SDP/DP混合比",
"default": 0.2,
"title": "Sdp Ratio"
},
"description": "SDP/DP混合比"
},
{
"name": "noise",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "感情",
"default": 0.2,
"title": "Noise"
},
"description": "感情"
},
{
"name": "noisew",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "音素長度",
"default": 0.9,
"title": "Noisew"
},
"description": "音素長度"
},
{
"name": "length",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "語速",
"default": 1,
"title": "Length"
},
"description": "語速"
},
{
"name": "language",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "語言",
"title": "Language"
},
"description": "語言"
},
{
"name": "auto_translate",
"in": "query",
"required": false,
"schema": {
"type": "boolean",
"description": "自動翻譯",
"default": false,
"title": "Auto Translate"
},
"description": "自動翻譯"
},
{
"name": "auto_split",
"in": "query",
"required": false,
"schema": {
"type": "boolean",
"description": "自動切分",
"default": false,
"title": "Auto Split"
},
"description": "自動切分"
},
{
"name": "emotion",
"in": "query",
"required": false,
"schema": {
"anyOf": [
{
"type": "integer"
},
{
"type": "string"
},
{
"type": "null"
}
],
"description": "emo",
"title": "Emotion"
},
"description": "emo"
}
],
"requestBody": {
"required": true,
"content": {
"multipart/form-data": {
"schema": {
"$ref": "#/components/schemas/Body_voice_voice_post"
}
}
}
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
},
"get": {
"summary": "Voice",
"description": "語音介面",
"operationId": "voice_voice_get",
"parameters": [
{
"name": "text",
"in": "query",
"required": true,
"schema": {
"type": "string",
"description": "輸入文字",
"title": "Text"
},
"description": "輸入文字"
},
{
"name": "model_id",
"in": "query",
"required": true,
"schema": {
"type": "integer",
"description": "模型ID",
"title": "Model Id"
},
"description": "模型ID"
},
{
"name": "speaker_name",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "說話人名",
"title": "Speaker Name"
},
"description": "說話人名"
},
{
"name": "speaker_id",
"in": "query",
"required": false,
"schema": {
"type": "integer",
"description": "說話人id,與speaker_name二選一",
"title": "Speaker Id"
},
"description": "說話人id,與speaker_name二選一"
},
{
"name": "sdp_ratio",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "SDP/DP混合比",
"default": 0.2,
"title": "Sdp Ratio"
},
"description": "SDP/DP混合比"
},
{
"name": "noise",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "感情",
"default": 0.2,
"title": "Noise"
},
"description": "感情"
},
{
"name": "noisew",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "音素長度",
"default": 0.9,
"title": "Noisew"
},
"description": "音素長度"
},
{
"name": "length",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "語速",
"default": 1,
"title": "Length"
},
"description": "語速"
},
{
"name": "language",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "語言",
"title": "Language"
},
"description": "語言"
},
{
"name": "auto_translate",
"in": "query",
"required": false,
"schema": {
"type": "boolean",
"description": "自動翻譯",
"default": false,
"title": "Auto Translate"
},
"description": "自動翻譯"
},
{
"name": "auto_split",
"in": "query",
"required": false,
"schema": {
"type": "boolean",
"description": "自動切分",
"default": false,
"title": "Auto Split"
},
"description": "自動切分"
},
{
"name": "emotion",
"in": "query",
"required": false,
"schema": {
"anyOf": [
{
"type": "integer"
},
{
"type": "string"
},
{
"type": "null"
}
],
"description": "emo",
"title": "Emotion"
},
"description": "emo"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/info": {
"get": {
"summary": "Get Loaded Models Info",
"description": "獲取已載入模型資訊",
"operationId": "get_loaded_models_info_models_info_get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
}
}
}
},
"/models/delete": {
"get": {
"summary": "Delete Model",
"description": "刪除指定模型",
"operationId": "delete_model_models_delete_get",
"parameters": [
{
"name": "model_id",
"in": "query",
"required": true,
"schema": {
"type": "integer",
"description": "刪除模型id",
"title": "Model Id"
},
"description": "刪除模型id"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/add": {
"get": {
"summary": "Add Model",
"description": "新增指定模型:允許重複新增相同路徑模型,且不重複佔用記憶體",
"operationId": "add_model_models_add_get",
"parameters": [
{
"name": "model_path",
"in": "query",
"required": true,
"schema": {
"type": "string",
"description": "新增模型路徑",
"title": "Model Path"
},
"description": "新增模型路徑"
},
{
"name": "config_path",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "新增模型配置檔案路徑,不填則使用./config.json或../config.json",
"title": "Config Path"
},
"description": "新增模型配置檔案路徑,不填則使用./config.json或../config.json"
},
{
"name": "device",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "推理使用裝置",
"default": "cuda",
"title": "Device"
},
"description": "推理使用裝置"
},
{
"name": "language",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "模型預設語言",
"default": "ZH",
"title": "Language"
},
"description": "模型預設語言"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/get_unloaded": {
"get": {
"summary": "Get Unloaded Models Info",
"description": "獲取未載入模型",
"operationId": "get_unloaded_models_info_models_get_unloaded_get",
"parameters": [
{
"name": "root_dir",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "搜尋根目錄",
"default": "Data",
"title": "Root Dir"
},
"description": "搜尋根目錄"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/get_local": {
"get": {
"summary": "Get Local Models Info",
"description": "獲取全部本地模型",
"operationId": "get_local_models_info_models_get_local_get",
"parameters": [
{
"name": "root_dir",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "搜尋根目錄",
"default": "Data",
"title": "Root Dir"
},
"description": "搜尋根目錄"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/status": {
"get": {
"summary": "Get Status",
"description": "獲取電腦執行狀態",
"operationId": "get_status_status_get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
}
}
}
},
"/tools/translate": {
"get": {
"summary": "Translate",
"description": "翻譯",
"operationId": "translate_tools_translate_get",
"parameters": [
{
"name": "texts",
"in": "query",
"required": true,
"schema": {
"type": "string",
"description": "待翻譯文字",
"title": "Texts"
},
"description": "待翻譯文字"
},
{
"name": "to_language",
"in": "query",
"required": true,
"schema": {
"type": "string",
"description": "翻譯目標語言",
"title": "To Language"
},
"description": "翻譯目標語言"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/tools/random_example": {
"get": {
"summary": "Random Example",
"description": "獲取一個隨機音訊+文字,用於對比,音訊會從本地目錄隨機選擇。",
"operationId": "random_example_tools_random_example_get",
"parameters": [
{
"name": "language",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "指定語言,未指定則隨機返回",
"title": "Language"
},
"description": "指定語言,未指定則隨機返回"
},
{
"name": "root_dir",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "搜尋根目錄",
"default": "Data",
"title": "Root Dir"
},
"description": "搜尋根目錄"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/tools/get_audio": {
"get": {
"summary": "Get Audio",
"operationId": "get_audio_tools_get_audio_get",
"parameters": [
{
"name": "path",
"in": "query",
"required": true,
"schema": {
"type": "string",
"description": "本地音訊路徑",
"title": "Path"
},
"description": "本地音訊路徑"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
}
},
"components": {
"schemas": {
"Body_voice_voice_post": {
"properties": {
"text": {
"type": "string",
"title": "Text"
},
"reference_audio": {
"type": "string",
"format": "binary",
"title": "Reference Audio"
}
},
"type": "object",
"required": [
"text"
],
"title": "Body_voice_voice_post"
},
"HTTPValidationError": {
"properties": {
"detail": {
"items": {
"$ref": "#/components/schemas/ValidationError"
},
"type": "array",
"title": "Detail"
}
},
"type": "object",
"title": "HTTPValidationError"
},
"ValidationError": {
"properties": {
"loc": {
"items": {
"anyOf": [
{
"type": "string"
},
{
"type": "integer"
}
]
},
"type": "array",
"title": "Location"
},
"msg": {
"type": "string",
"title": "Message"
},
"type": {
"type": "string",
"title": "Error Type"
}
},
"type": "object",
"required": [
"loc",
"msg",
"type"
],
"title": "ValidationError"
}
}
}
}
最後奉上Bert-vits2-v2.2本地訓練推理整合包:
https://pan.baidu.com/s/1OVX9seRwZR6bZ-xsE_nRLg?pwd=v3uc
與眾鄉親同饗。