kaldi中文語音識別thchs30模型訓練程式碼功能和配置引數解讀
Monophone
單音素模型的訓練
# Flat start and monophone training, with delta-delta features. # This script applies cepstral mean normalization (per speaker).
#monophone 訓練單音素模型
steps/train_mono.sh --boost-silence 1.25 --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/mono || exit 1;
#test monophone model
local/thchs-30_decode.sh --mono true --nj $n "steps/decode.sh" exp/mono data/mfcc &
train_mono.sh用法
echo "Usage: steps/train_mono.sh [options] <data-dir> <lang-dir> <exp-dir>"
echo " e.g.: steps/train_mono.sh data/train.1k data/lang exp/mono"
echo "main options (for others, see top of script file)"
其中的引數設定,訓練單音素的基礎HMM模型,迭代40次,並按照realign_iters的次數對資料對齊
# Begin configuration section.
nj=4
cmd=run.pl
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
num_iters=40 # Number of iterations of training
max_iter_inc=30 # Last iter to increase #Gauss on.
totgauss=1000 # Target #Gaussians.
careful=false
boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
realign_iters="1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 23 26 29 32 35 38";
config= # name of config file.
stage=-4
power=0.25 # exponent to determine number of gaussians from occurrence counts
norm_vars=false # deprecated, prefer --cmvn-opts "--norm-vars=false"
cmvn_opts= # can be used to add extra options to cmvn.
# End configuration section.
thchs-30_decode.sh測試單音素模型,實際使用mkgraph.sh建立完全的識別網路,並輸出一個有限狀態轉換器,最後使用decode.sh以語言模型和測試資料為輸入計算WER.
#decode word
utils/mkgraph.sh $opt data/graph/lang $srcdir $srcdir/graph_word || exit 1;
$decoder --cmd "$decode_cmd" --nj $nj $srcdir/graph_word $datadir/test $srcdir/decode_test_word || exit 1
#decode phone
utils/mkgraph.sh $opt data/graph_phone/lang $srcdir $srcdir/graph_phone || exit 1;
$decoder --cmd "$decode_cmd" --nj $nj $srcdir/graph_phone $datadir/test_phone $srcdir/decode_test_phone || exit 1
#monophone_ali
steps/align_si.sh --boost-silence 1.25 --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/mono exp/mono_ali || exit 1;
# Computes training alignments using a model with delta or
# LDA+MLLT features.
# If you supply the "--use-graphs true" option, it will use the training
# graphs from the source directory (where the model is). In this
# case the number of jobs must match with the source directory.
echo "usage: steps/align_si.sh <data-dir> <lang-dir> <src-dir> <align-dir>"
- echo "e.g.: steps/align_si.sh data/train data/lang exp/tri1 exp/tri1_ali"
echo "main options (for others, see top of script file)"
echo " --config <config-file> # config containing options"
echo " --nj <nj> # number of parallel jobs"
echo " --use-graphs true # use graphs in src-dir"
echo " --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
Triphone
以單音素模型為輸入訓練上下文相關的三音素模型
#triphone
steps/train_deltas.sh --boost-silence 1.25 --cmd "$train_cmd" 2000 10000 data/mfcc/train data/lang exp/mono_ali exp/tri1 || exit 1;
#test tri1 model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri1 data/mfcc &
train_deltas.sh中的相關配置如下,其中輸入
# Begin configuration.
stage=-4 # This allows restarting after partway, when something when wrong.
config=
cmd=run.pl
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
realign_iters="10 20 30";
num_iters=35 # Number of iterations of training
max_iter_inc=25 # Last iter to increase #Gauss on.
beam=10
careful=false
retry_beam=40
boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
power=0.25 # Exponent for number of gaussians according to occurrence counts
cluster_thresh=-1 # for build-tree control final bottom-up clustering of leaves
norm_vars=false # deprecated. Prefer --cmvn-opts "--norm-vars=true"
# use the option --cmvn-opts "--norm-means=false"
cmvn_opts=
delta_opts=
context_opts= # use"--context-width=5 --central-position=2" for quinphone
# End configuration.
echo "Usage: steps/train_deltas.sh <num-leaves> <tot-gauss> <data-dir> <lang-dir> <alignment-dir> <exp-dir>"
echo "e.g.: steps/train_deltas.sh 2000 10000 data/train_si84_half data/lang exp/mono_ali exp/tri1"
LDA_MLLT
對特徵使用LDA和MLLT進行變換,訓練加入LDA和MLLT的三音素模型。
LDA+MLLT refers to the way we transform the features after computing the MFCCs: we splice across several frames, reduce the dimension (to 40 by default) using Linear Discriminant Analysis), and then later estimate, over multiple iterations, a diagonalizing transform known as MLLT or CTC.
詳情可參考 http://kaldi-asr.org/doc/transform.html
#triphone_ali
steps/align_si.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri1 exp/tri1_ali || exit 1;
#lda_mllt
steps/train_lda_mllt.sh --cmd "$train_cmd" --splice-opts "--left-context=3 --right-context=3" 2500 15000 data/mfcc/train data/lang exp/tri1_ali exp/tri2b || exit 1;
#test tri2b model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri2b data/mfcc &
train_lda_mllt.sh相關程式碼配置如下:
# Begin configuration.
cmd=run.pl
config=
stage=-5
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
realign_iters="10 20 30";
mllt_iters="2 4 6 12";
num_iters=35 # Number of iterations of training
max_iter_inc=25 # Last iter to increase #Gauss on.
dim=40
beam=10
retry_beam=40
careful=false
boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
power=0.25 # Exponent for number of gaussians according to occurrence counts
randprune=4.0 # This is approximately the ratio by which we will speed up the
# LDA and MLLT calculations via randomized pruning.
splice_opts=
cluster_thresh=-1 # for build-tree control final bottom-up clustering of leaves
norm_vars=false # deprecated. Prefer --cmvn-opts "--norm-vars=false"
cmvn_opts=
context_opts= # use "--context-width=5 --central-position=2" for quinphone.
# End configuration.
Sat
運用基於特徵空間的最大似然線性迴歸(fMLLR)進行說話人自適應訓練
This does Speaker Adapted Training (SAT), i.e. train on fMLLR-adapted features. It can be done on top of either LDA+MLLT, or delta and delta-delta features. If there
are no transforms supplied in the alignment directory, it will estimate transforms itself before building the tree (and in any case, it estimates transforms a number of
times during training).
#lda_mllt_ali
steps/align_si.sh --nj $n --cmd "$train_cmd" --use-graphs true data/mfcc/train data/lang exp/tri2b exp/tri2b_ali || exit 1;
#sat
steps/train_sat.sh --cmd "$train_cmd" 2500 15000 data/mfcc/train data/lang exp/tri2b_ali exp/tri3b || exit 1;
#test tri3b model
local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri3b data/mfcc &
train_sat.sh的具體配置如下:
# Begin configuration section.
stage=-5
exit_stage=-100 # you can use this to require it to exit at the
# beginning of a specific stage. Not all values are
# supported.
fmllr_update_type=full
cmd=run.pl
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
beam=10
retry_beam=40
careful=false
boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
context_opts= # e.g. set this to "--context-width 5 --central-position 2" for quinphone.
realign_iters="10 20 30";
fmllr_iters="2 4 6 12";
silence_weight=0.0 # Weight on silence in fMLLR estimation.
num_iters=35 # Number of iterations of training
max_iter_inc=25 # Last iter to increase #Gauss on.
power=0.2 # Exponent for number of gaussians according to occurrence counts
cluster_thresh=-1 # for build-tree control final bottom-up clustering of leaves
phone_map=
train_tree=true
tree_stats_opts=
cluster_phones_opts=
compile_questions_opts=
# End configuration section.
Decoding script that does fMLLR. This can be on top of delta+delta-delta, or LDA+MLLT features.
# There are 3 models involved potentially in this script,
# and for a standard, speaker-independent system they will all be the same.
# The "alignment model" is for the 1st-pass decoding and to get the
# Gaussian-level alignments for the "adaptation model" the first time we
# do fMLLR. The "adaptation model" is used to estimate fMLLR transforms
# and to generate state-level lattices. The lattices are then rescored
# with the "final model".
#
# The following table explains where we get these 3 models from.
# Note: $srcdir is one level up from the decoding directory.
#
# Model Default source:
#
# "alignment model" $srcdir/final.alimdl --alignment-model <model>
# (or $srcdir/final.mdl if alimdl absent)
# "adaptation model" $srcdir/final.mdl --adapt-model <model>
# "final model" $srcdir/final.mdl --final-model <model>
Train a model on top of existing features (no feature-space learning of any kind is done). This script initializes the model (i.e., the GMMs) from the previous
system's model.That is: for each state in the current model (after tree building), it chooses the closes state in the old model, judging the similarities based on overlap of counts in
the tree stats.
#sat_ali
steps/align_fmllr.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri3b exp/tri3b_ali || exit 1;
#quick
steps/train_quick.sh --cmd "$train_cmd" 4200 40000 data/mfcc/train data/lang exp/tri3b_ali exp/tri4b || exit 1;
#test tri4b model
local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri4b data/mfcc &
train_quick.sh的配置:
# Begin configuration..
cmd=run.pl
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
realign_iters="10 15"; # Only realign twice.
num_iters=20 # Number of iterations of training
maxiterinc=15 # Last iter to increase #Gauss on.
batch_size=750 # batch size to use while compiling graphs... memory/speed tradeoff.
beam=10 # alignment beam.
retry_beam=40
stage=-5
cluster_thresh=-1 # for build-tree control final bottom-up clustering of leaves
# End configuration section.
- 頂
- 踩
相關文章
- 海南話語音識別模型——模型訓練(一)模型
- Kaldi在語音資料庫timit上的聲學和語音模型訓練--1資料庫MIT模型
- 通用模型、全新框架,WavLM語音預訓練模型全解模型框架
- OCR訓練中文樣本庫和識別
- 新一代 Kaldi: 支援 JavaScript 進行本地語音識別和語音合成啦!JavaScript
- 如何訓練2457億引數量的中文巨量模型“源1.0”模型
- Windows Phone 8 新增功能:TTS文字朗讀功能 和 語音識別 APIWindowsTTSAPI
- Yolov8訓練識別模型YOLO模型
- 利用PyTorch訓練模型識別數字+英文圖片驗證碼PyTorch模型
- KALDI語音識別庫在LINUX下的安裝和編譯Linux編譯
- fasttext訓練模型程式碼AST模型
- thchs30線上識別解碼器的應用S3
- 語音識別--kaldi環境搭建(基於Ubuntu系統)Ubuntu
- PyTorch 模型訓練實⽤教程(程式碼訓練步驟講解)PyTorch模型
- EasyNLP釋出融合語言學和事實知識的中文預訓練模型CKBERT模型
- 【LLM訓練系列】NanoGPT原始碼詳解和中文GPT訓練實踐NaNGPT原始碼
- Findings | 中文預訓練語言模型回顧模型
- 帶你讀論文 | 端到端語音識別模型模型
- 語音識別開源工具PyTorch-Kaldi:兼顧Kaldi效率與PyTorch靈活性開源工具PyTorch
- 知識增強的預訓練語言模型系列之ERNIE:如何為預訓練語言模型注入知識模型
- 解讀乾貨:詞語對齊的注意力機制,提升中文預訓練模型效果模型
- Kaldi搭建語音識別系統—發音詞典相關檔案準備
- 讓預訓練語言模型讀懂數字:超對稱技術釋出 10 億引數 BigBang Transformer [乾元]金融大規模預訓練語言模型模型ORM
- 語音識別的特徵提取中的相關引數特徵
- 使用人工神經網路訓練手寫數字識別模型神經網路模型
- 知識增廣的預訓練語言模型K-BERT:將知識圖譜作為訓練語料模型
- nginx gzip配置引數解讀Nginx
- 目標識別程式碼解讀整理
- 利用Python訓練手勢模型程式碼Python模型
- 使用DeepKE訓練命名實體識別模型DEMO(官方DEMO)模型
- 自訓練 + 預訓練 = 更好的自然語言理解模型模型
- 如何訓練一個簡單的音訊識別網路音訊
- 知識增強的預訓練語言模型系列之KEPLER:如何針對上下文和知識圖譜聯合訓練模型
- 微信小程式使用同聲傳譯實現語音識別功能微信小程式
- 實驗12-使用keras預訓練模型完成貓狗識別Keras模型
- Kaldi配置環境變數變數
- Windows10系統如何禁用語音識別功能Windows
- 機器學習引數模型與非引數模型/生成模型與判別模型機器學習模型