Coqui TTS合成語音

chyun2011發表於2024-12-03

工具介紹

Coqui TTS是一個用於語音轉文字的高效能深度學習模型庫。提供1100種語言的預訓練模型,提供訓練新模型和微調已有模型的工具,提供資料集分析工具。XTTS-v2版本支援16種語言: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu) and Korean (ko)。

安裝步驟

  1. conda安裝參考 python環境搭建
  2. conda create -n coqui 建立虛擬環境
  3. conda activate coqui 進入虛擬環境
  4. conda install python=3.9.20 安裝python>= 3.9,< 3.12
  5. pip install pypinyin 合成中文語音依賴庫
  6. pip install numpy 依賴庫
  7. pip install sounddevice
  8. pip install TTS 安裝Coqui TTS
  9. 如果TTS安裝報錯Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools",可用使用它的分支專案安裝pip install coqui-tts

體驗功能

  1. 檢查支援的語言:
    • tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 --list_language_idx
    • 同意Otherwise, I agree to the terms of the non-commercial CPML: https://coqui.ai/cpml條款, 輸入Y
    • 輸出支援的語言:
    Available language ids: (Set --language_idx flag to one of these values to use the multi-lingual model.
    ['en', 'es', 'fr', 'de', 'it', 'pt', 'pl', 'tr', 'ru', 'nl', 'cs', 'ar', 'zh-cn', 'hu', 'ko', 'ja', 'hi']
    
  2. 檢查支援的播報員:
    • tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 --list_speaker_idx
  3. 合成案例
  • 合成中文語音
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 --text "國家糧食和物資儲備局29日釋出資料顯示:截至目前,全國累計收購秋糧1.2億噸,收購進度快於上年,收購工作進展順利。" --speaker_idx "Ana Florence" --language_idx zh --use_cuda true
  • 指定音色檔案合成
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 --text "國家糧食和物資儲備局29日釋出資料顯示:截至目前,全國累計收購秋糧1.2億噸,收購進度快於上年,收購工作進展順利。" --speaker_idx "Ana Florence" --language_idx zh --speaker_wav e:/source.mp3 --use_cuda true
  • 合成英文語音
tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 --text "TTS is a library for advanced Text-to-Speech generation.TTS models that are not released open-source. They are here to show the potential. Models prefixed with a dot (.Jofish .Abe and .Janice) are real human voices." --speaker_idx "Ana Florence" --language_idx en --use_cuda true 

程式合成案例

# -*- coding: UTF-8 -*-
import torch
from TTS.api import TTS
import numpy as np
import sounddevice as sd
import soundfile as sf
from datetime import datetime


device = "cuda" if torch.cuda.is_available() else "cpu"

# 列出可用模型
print(TTS().list_models())

print("開始初始化模型:", datetime.now())

# tts_models/multilingual/multi-dataset/xtts_v2是模型標識
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)

print("初始化模型完成:", datetime.now())


# 從檔案中讀取文字
with open('demo.txt','r',encoding='utf-8') as source_file:
    content = source_file.read()

print("文字讀取完成:", datetime.now())

# 參考語音檔案,要模仿的音色
source_wav = 'source.mp3'
generated_voide = 'generated_voice.wav'
# 文字生成語音
wav = tts.tts(text=content, speaker_wav=source_wav, language="zh")
# 播放語音
rate = 22050
sd.play(wav, rate)
# 等待播放結果
sd.wait()
# 儲存為檔案
sf.write(generated_voide, wav, rate) 

# 文字轉為語音檔案直接儲存
tts.tts_to_file(text=content, speaker_wav=source_wav, language="zh", file_path="example.wav")

相關文章