
  • 登录
  • 忘记密码?点击找回


  • 获取手机验证码 60
  • 注册


  • 获取手机验证码60
  • 找回
毕业论文网 > 毕业论文 > 计算机类 > 软件工程 > 正文


 2021-11-06 23:15:42  

摘 要


论文主要研究了端到端系统中声码器的方法,首先对传统声码器以及构建声码器的前沿技术进行分析,然后选择设计了Tactron griffin-lim模型声码器和wavenet模型声码器用于合成语音,并进行仿真实验。其中griffin-lim声码器是一种特殊的声码器,其不需要相位信息就可以由频谱重建波形,主要根据的是帧之间的关系采用迭代算法来估计相位的信息。具体的做法为:从Linear-Spectrum中恢复相位,再通过短时傅里叶变化ISTFT还原波形。其端到端系统的结构为Encoder-gt;Attention-gt;Decoder-gt;Post-processing-gt;Griffin-Lim转换为声音波形。wavenet是一种基于CNN的采样点自回归模型,其采用了因果卷积,通过历史样本来预测未来信息;其采用空洞间距使得感受野随层数的升高而增大;同时模型中使用了残差网络及跳步连接来加速模型的训练速度,训练好wavenet即可生成语音波形。通过对Mel-Spectrum作为输入,为其添加语义信息,因为采样点的长度与Mel-Spectrum长度间的不匹配,需要使mel-spectrum进行长度对齐,方法为将Mel-spectrum直接复制采样到采样点的长度。实验结果表明:基于wavenet模型构建的声码器所合成的语音具有更好的自然度和流畅度,但是其生成语音的速率较慢,而griffin-lim生成语音较快,合成的语音自然度及流畅程度不及前者。




This paper mainly investigates the speech synthesis technology, first understands how the human body emits speech, and then builds a physical model based on this for human speech generation, and finally forms two main speech synthesis methods, waveform stitching method and statistical parameter speech synthesis method. For statistical parameter speech method, deep learning technology is generally used for acoustic modeling, and statistical parameter method is mainly based on hidden Markov model, in which acoustic modeling and vocoder are the key technologies.

Main researches: This paper mainly studies the method of vocoder in end-to-end system. Firstly, it analyzes the traditional vocoder and the cutting-edge technology of building vocoder, and then chooses to design Tactron griffin-lim model vocoder and wavenet model vocoder. The encoder is used to synthesize speech and carry out simulation experiments. Among them, the griffin-lim vocoder is a special vocoder, which can reconstruct the waveform from the spectrum without phase information. It mainly uses an iterative algorithm to estimate the phase information based on the relationship between the frames. The specific method is: recover the phase from Linear-Spectrum, and then restore the waveform by short-time Fourier change ISTFT. The structure of its end-to-end system is Encoder-gt; Attention-gt; Decoder-gt; Post-processing-gt; Griffin-Lim converted to sound waveform. Wavenet is a CNN-based sampling point autoregressive model, which uses causal convolution to predict future information through historical samples; it uses hole spacing to make the receptive field increase with the number of layers; at the same time, the model uses The residual network and step-by-step connection accelerate the training speed of the model. After training the wavenet, a voice waveform can be generated. Adding semantic information to the Mel-Spectrum as input, because the length of the sampling point does not match the length of the Mel-Spectrum, the length of the mel-spectrum needs to be aligned, the method is to directly copy and sample the Mel-spectrum to the sampling point length.

Research results: The experimental results show that the speech synthesized by the vocoder built on the wavenet model has better naturalness and fluency, but the rate of generating speech is slower, while griffin-lim generates speech faster, and the naturalness of the synthesized speech is Fluency is not as good as the former.

Characteristic: Review the method of speech synthesis and the traditional vocoder method, and finally construct a wavenet vocoder based on convolutional neural network and compare with griffin-lim vocoder.

Key Words:Speech synthesis;Vocoder;CNN;Griffin-lim;Wavenet

目 录

第1章 绪论 8

1.1 研究背景及意义 8

1.2 国内外研究现状 8

1.3 研究方法及技术手段 9

1.4 本论文主要结构 9

第2章 语音合成与声码器方法 10

2.1 语音合成技术概述 10

2.2 语音波形的生成过程 11

2.3 常见语音合成方法 12

2.3.1 波形拼接方法合成语音 12

2.3.2 统计参数合成语音 13

2.4 主流声码器分析 16

2.4.1 线性预测分析语音合成方法 17

2.4.2 共振峰语音合成方法 18

2.4.3 STRAIGHT分析语音合成方法 20

2.4.4 Griffin-lim语音合成方法 21

2.5 本章小结 22

第3章 基于WaveNet的声码器 23

3.1 声码器的发展概述 23

3.2 WaveNet模型 24

3.2.1 卷积神经网络概述 24

3.2.2 空洞因果卷积 26

3.2.3 门控激活函数及残差网络结构 27

3.3 基于wavenet模型构造声码器 29

3.4 实验与结果分析 30

3.4.1 实验配置及步骤 30

3.4.2 实验结果分析 31

3.5 本章小结 34

第4章 总结与展望 35

4.1 总结 35

4.2 展望 35

参考文献 36

致 谢 37

第1章 绪论

1.1 研究背景及意义


1.2 国内外研究现状



您需要先支付 50元 才能查看全部内容!立即支付


Copyright © 2010-2022 毕业论文网 站点地图