merlin 基础概念

3 min readMay 5, 2020

merlin base

术语表

Front end 前端
vocoder 声音合成机（声码器）
MFCC 受限波尔曼兹机
bap band aperiodicity
ASR：Automatic Speech Recognition自动语音识别
AM：声学模型
LM：语言模型
HMM：Hiden Markov Model 输出序列用于描述语音的特征向量，状态序列表示相应的文字
HTS：HMM-based Speech Synthesis System语音合成工具包
HTK：Hidden Markov Model Toolkit 语音识别的工具包自编码器
SPTK：speech signal precessing toolkit
SPSS : 统计参数语音合成statistical parametric speech synthesis
pitch 音高：表示声音(基本)频率的高低
Timbre 音色
Zero Crossing Rate 过零率
Volume 音量
sil silence
syllable 音节
intonation 声调，语调，抑扬顿挫
POS part of speech
mgc
mcep Mel-Generalized Cepstral Reprfesentation
mcc mel cepstral coefficents
mfcc Mel Frequency Cepstral Coefficents
LSP: Line Spectral Pair线谱对参数
多个音素的命名规则
monophone 单音素
biphone diphone 两音素
triphone 三音素
quadphone 四音素
utterance 语音，发声
英语韵律符号系统ToBI(Tone and Break Index)
CD-DNN-HMM（Context-Dependent DNN-HMM）
frontend :The part of a TTS system that transforms plain
text into a linguistic representation is called a frontend
.wpa word to phonetic alphabet
.cmp Composed acoustic features
.scp system control program
.mlf master label file
.pam phonetic alphabets to model
.mgc mel generalized cepstral feature
.lf0 log f0 a representation of pitch（音高）音高用基频表示
.mgc
.utt .utt files are the linguistic representation of the text that Festival outputs（full context training labels)
.cfg
initial && final 声母和韵母

AM Acoustic Model，声学模型
ACR Absolute Category Rating，绝对等级评定
ASR Automatic Speech Recognition，自动语音识别
CART Classification and Regression Tree，分类回归树
CCR Comparison Category Rating，比较等级评定
CFHMM Continuous F0，连续基频模型
CMLLR Constrained Maximum Likelihood Linear Regression，受限最大似然线性回归
CMOS Comparison Mean Opinion Score，比较平均意见分
CORC Correlation Coefficient，相关系数
CR Command-Response，命令响应
CSMAPLR Constrained Structural Maximum A Posterior Linear Regression，受限结构化最大后验概率线性回归
DBN Dynamic Bayesian Network，动态贝叶斯网络
DCR Degradation Category Rating，损伤等级评定
DCT Discrete Cosine Transform，离散余弦变换
DMOS Degradation Mean Opinion Score，损伤平均意见分
ED Emotion Dependent，特定情感
EM Expectation Maximization，期望最大化
F0 Fundamental Frequency，基音频率
GMM Gaussian Mixture Model，高斯混合模型
GTD Global Tied Distribution，全局绑定分布
HMM Hidden Markov Model，隐马尔科夫模型
HNR Harmony Noise Ratio，谐波噪声比
HSS HMM-based Speech Synthesis，基于HMM的语音合成
HSMM Hidden Semi-Markov Model，隐半马尔科夫模型
HTK HMM Tool Kit，HMM工具包
HTS HMM-based Speech Synthesis System，基于HMM的语音合成系统
LPC Linear Prediction Coefficient，线性预测系数
MAP Maximum A Posterior，最大后验概率
MCD Mel-Cepstral Distortion，倒谱系数失真
MDL Minimum Description Length，最小描述长度
MDS Multi-Dimensional Scaling，多维标度
MGCC Mel-Generalized Cepstral Coefficient，梅尔广义倒谱系数
MLI Maximum Likelihood Increase，最大似然增量
MLSA Mel Log Spectral Approximation，梅尔对数谱近似
MLLR Maximum Likelihood Linear Regression，最大似然线性回归
MLPG Maximum Likelihood Parameter Generation，最大似然参数生成
MOS Mean Opinion Score，平均意见分
MSD Multi-Space Distribution，多空间分布
PiTAR Pitch Target Realisation，基频目标实现
PM Prosodic Model，韵律模型
RMSE Root-Mean-Square-Error，根均方误差
SA Speaker Adaptation，说话人自适应
SI Speaker Independent，说话人无关
SMAP Structural Maximum A Posterior，结构化最大后验概率
SMAPLR Structural Maximum A Posterior Linear Regression，结构化最大后验概率线性回归
SPTK Speech Processing Tool Kit，语音处理工具包
SSM Supra-Segmental Model，超音段模型
SSML Speech Synthesis Markup Language，语音合成标记语言
TA Target Approximation，目标逼近
ToBI Tone and Break Index，调式与停顿标记
TTS Text-To-Speech，文语转换
VC Voice Conversion，声音转换
VFS Vector Field Smoothing，矢量场平滑
VPR Voice Print Recognition，声纹识别
VTLN Vocal Tract Length Normalization，声道长度规整

MTTS Merlin/Mandarin Text-to-Speech Document

merlin 只提供了基础的TTS声学模型，声学和语音特征归一化，神经网络声学模型训练和生成。
前端文本处理(frontend)
festival
festvox
hts
htk
声码器（vocoder)
WORLD
SPTK
MagPhase

merlin 安装教程

# python2.7环境下安装merlin
# 建议在conda环境下建立merlin环境
sudo apt-get install csh cmake realpath autotools-dev automake
pip install numpy scipy matplotlib lxml theano bandmat
git clone https://github.com/CSTR-Edinburgh/merlin.git
cd merlin/tools
./compile_tools.sh

运行Merlin demo

$ sudo bash ～/merlin/egs/slt_arctic/s1/run_demo.sh

Merlin源码理解

语料库包含了文本和音频文件，文本需要首先通过前端FrontEnd处理成神经网络可接受的数据，这一步比较繁琐，不同语言也各不相同，下面会着重讲解。音频文件则通过声码器（这里使用的是STRAIGHT声码器）转换成声码器参数（包括了mfcc梅谱倒谱系数，f0基频，bap：band aperiodicity等）再参与到神经网络的训练之中。

*Merlin: An Open Source Neural Network Speech Synthesis System ：

merlin

merlin算法的基本知识

github地址：merlin

https://github.com/CSTR-Edinburgh/merlin

front-end text processor： Festival

地址：http://www.cstr.ed.ac.uk/projects/festival/

vocoder： STRAIGHT or WORLD

install

$ bash tools/compile_tools.sh        #SPTK, WORLD
$ bash tools/compile_other_speech_tools.sh         #speech tools, festival and festvox
$ bash tools/compile_htk.sh yanerle KSmLJjjt         #align labels, Merlin requires installation of HTK
$ pip install numpy 
$ pip install -r requirements.txt

example ：Getting started with the Merlin Speech Synthesis Toolkit

地址：https://jrmeyer.github.io/tts/2017/02/14/Installing-Merlin.html

教程

地址：http://www.speech.zone/courses/one-off/merlin-interspeech2017
CMU_ARCTIC databases：https://cstr-edinburgh.github.io/merlin/getting-started/slt-arctic-voice/

merlin 基础概念

merlin base

术语表

MTTS Merlin/Mandarin Text-to-Speech Document

merlin 安装教程

运行Merlin demo

Merlin源码理解

merlin

merlin算法的基本知识

github地址：merlin

front-end text processor： Festival

vocoder： STRAIGHT or WORLD

install

example ：Getting started with the Merlin Speech Synthesis Toolkit

教程

docker images：merlinDnn

merlin基础学习：https://mtts.readthedocs.io/zh_CN/latest/#

Written by AI拉呱

No responses yet