
  • 登录
  • 忘记密码?点击找回


  • 获取手机验证码 60
  • 注册


  • 获取手机验证码60
  • 找回
毕业论文网 > 毕业论文 > 电子信息类 > 通信工程 > 正文


 2021-03-15 20:02:34  

摘 要

Abstract III

1.绪论 1

1.1研究目的及意义 1

1.2国内外研究现状 1

1.3主要内容及结构安排 2

2 语音识别基本原理 4

2.1特征提取 4

2.2语音建模 7

2.3传统语音识别模型 8

2.4 人工神经网络(ANN) 9

2.5 本章小结 12

3 声学模型与深度神经网络 13

3.1 循环神经网络 13

3.2 联结时间分类 16

3.3 时延神经网络 18

3.4本章小结 20

4.实验与结果分析 22

4.1.实验环境搭建 22

4.2 准备工作 22

4.3 模型训练 24

4.4 识别结果与分析 26

5.总结与展望 28

参考文献 29

致 谢 31






Over the past decade, the development of speech recognition technology by leaps and bounds.In the noisy distance and other conditions, the identification of artificial intelligence has reached 97%, beyond the level of human recognition. The traditional GMM-HMM framework has once become a bottleneck in speech recognition tasks, and it is difficult to make breakthroughs in long sequence recognition tasks. The emergence of deep learning not only can speed up the learning speed of a large number of data training speech recognition models, and in the form of free dialogue, speech recognition technology has a new development.

Based on the discussion of traditional mfcc feature extraction process, GMM-HMM and basic depth neural network DNN-HMM, this design will introduce the pre-training plus fine-tuning method of deep neural network learning algorithm and introduced two acoustic models, the combination of CTC with LSTM end-to-end model, and TDNN. In addition, this design proposes a data preprocessing method for recognizing bilingual data. By introducing the data preparation and kaldi speech recognition tool to train the acoustic model, the training environment of the above acoustic model is built on kaldi. The experimental results show that the recognition rate of the bilingual training set is 1% -2% lower than that of the pure Chinese training set on the same speech recognition task. The depth acoustic model combined with LSTM and TDNN is higher than that of the traditional GMM-HMM model in Chinese and bilingual Long sequence recognition task increased by 25.75%, compared with DNN-HMM model increased 2.85%

.Keywords: Speech Recognition;Deep Learning;Bilingual;LSTM;TDNN



随着深度学习和人工智能的热潮,语音识别技术的研究自20世纪50年发展起,在近二十年来,语音识别技术有了显著的提升[1],如今以苹果手机的siri,科大讯飞的语音输入法为代表,被众多专家学者认为是21世纪10大最具发展潜力的热门学科之一。如今语音识别作为模式识别的一个分支,与机器学习中的深度学习相结合,通过计算机将语音自动识别成文本,即自动语音识别(Automatic Speech Recognition ,ASR)技术也成为了国内外许多专家学者备受追捧的研究方向。本次设计将训练出GMM-HMM(Gaussian Mixture Model -Hidden Markov Model)、DNN-HMM(Deep nerual network-Hidden Markov Model)、LSTM(Long Short Term Mermory network,LSTM)、TDNN(Time Delay Nerual Network,TDNN)、TDNN和LSTM相结合的五种声学模型用纯中文数据和双语语音数据在kaidi平台上进行训练,通过测试结果比对五种声学模型在两类数据上的识别率,得出五种声学模型的优劣。

本次设计将从理论上讨论语音识别过程、DNN-HMM声学模型中涉及的反向传播算法;讨论深度神经网络的逐层贪婪训练算法;讨论结合CTC(Connectionist Temporal Classification)算法的LSTM优化的网络模型结构;从实验上将从数据准备、特征提取、模型训练三个方面详细论述基于kaldi中nnet3的模型训练实现过程。

您需要先支付 50元 才能查看全部内容!立即支付


Copyright © 2010-2022 毕业论文网 站点地图