基于深度神经网络的情感分析研究毕业论文
2022-01-08 22:00:50
论文总字数:25067字
摘 要
随着网络科技的发展,网民的数量也越来越多,网民每天上网冲浪时,会产生大量的网络文本信息。这些信息中包含的情感倾向,反映了大众对某一事件的看法。所以文本情感分析对舆论监督、商业推广等领域有着巨大的潜在价值。
对文本进行情感分析,首先需要对文本进行预处理,去除文本中对情感分析无用的信息并将句子分割成一个个有效词组成的序列。本文采用了jieba分词工具对文本进行了分词处理,并且整合了多个停用词表对文本进行去停用词。
其次是将预处理之后的序列转化为一个个词向量,本文使用了word2vec对预处理之后的文本进行词向量化,word2vec不仅能将序列转化为向量,还能通过向量表示这些序列间的关系。
然后是建立深度神经网络模型,对文本进行情感分析。本文通过使用BiLSTM模型,对网络文本进行建模分类,采用二分类的方法将评论文本分为正向评论与负向评论。LSTM本身可以有效解决RNN模型在延时间反向传播训练时产生的梯度消失问题,被广泛应用于情感分析。而BiLSTM在原有的LSTM基础上,加上了一层反向的LSTM,可以从前向与后向两个方向捕捉信息,因而在特征获取方面具有一定优势,使用BiLSTM准确率能在原来基础上提升1%左右。
在使用普通BiLSTM时,当处理的序列过长时,很难找到序列之间的关系,本文提出在BiLSTM模型中加入注意力机制,直接计算每个序列对该文本情感极性的重要程度,能大大提高特征提取的准确性。
本文使用该BiLSTM 注意力机制模型与传统LSTM模型、BiLSTM模型进行对比。实验发现:在BiLSTM模型中引入注意力机制,对重要的特征增加权重,能有效提高模型训练结果的准确性。对比传统LSTM模型,准确率提高了2.5%,比BiLSTM模型准确率提高了2%左右。
关键词:深度神经网络 注意力机制 双向长短期记忆网络 情感分析
Research on emotion analysis based on deep neural network
Abstract
With the development of network technology, the number of netizens is increasing. When netizens surf the Internet every day, they will produce a lot of text information. The emotional tendency contained in the information reflects the public's view of an event. So text sentiment analysis has great potential value in the fields of public opinion supervision, business promotion and so on.
In the emotional analysis of text, first of all, we need to preprocess the text, remove the useless information in the text and divide the sentence into a sequence of effective words. In this paper, we use the Jieba word segmentation tool to segment the text, and integrate multiple stoppages to remove the stoppages.
Secondly, the preprocessed sequences are transformed into word vectors. In this paper, word2vec is used to quantize the preprocessed texts. Word2vec can not only transform the sequences into vectors, but also express the relations between these sequences through vectors.
Then, we build a deep neural network model to analyze the emotion of the text. In this paper, the BiLSTM model is used to model and classify the network text, and the two classification method is used to divide the comment text into positive comment and negative comment. LSTM itself can effectively solve the problem of gradient vanishing when RNN model is trained in delayed back propagation, which is widely used in emotion analysis. On the other hand, based on the original LSTM and a layer of reverse LSTM, BiLSTM can capture information from the front to the back, so it has some advantages in feature acquisition. Using BiLSTM can improve the accuracy by about 1% on the original basis.
When using ordinary BiLSTM, when the processed sequence is too long, it is difficult to find the relationship between sequences. This paper proposes to add attention mechanism in the BiLSTM model, directly calculate the importance of each sequence to the emotional polarity of the text, which can greatly improve the accuracy of feature extraction.
This paper uses the BiLSTM attention mechanism model to compare with the traditional LSTM model and BiLSTM model. It is found that the accuracy of training results can be improved by introducing attention mechanism into the BiLSTM model and adding weight to important features. Compared with the traditional LSTM model, the accuracy is increased by 2.5%, which is about 2% higher than that of the BiLSTM model.
Key words: Deep neural network;attention mechanism;Bidirectional Long short-term memory;emotion analysis
目 录
摘 要 I
Abstract II
第一章 绪论 1
1.1情感分析的研究背景 1
1.2情感分析的研究意义 1
1.3国内外研究现状 2
1.3.1情感分析的研究现状 2
1.3.2深度学习的研究现状 4
1.4本文研究内容 5
1.5工作重点 6
1.6文章结构安排 7
第二章 文字预处理与特征提取 8
2.1文字预处理 8
2.1.1分词处理 8
2.1.2词性标注 9
2.1.3去停用词与文本清洗 9
2.2文本表示与特征抽取 9
2.2.1词袋模型 10
2.2.2词向量模型 10
2.3本章总结 12
第三章 基于注意力机制的深度神经网络模型 13
3.1深度学习的相关技术 13
3.1.1前馈神经网络 13
3.1.2卷积神经网络 14
3.1.2循环神经网络 15
3.1.3长短期记忆网络 16
3.1.4双向长短期记忆网络 17
3.2注意力机制 18
3.2.1软注意力机制 18
3.2.2自注意力机制 20
3.3 本章总结 21
第四章 BiLSTM-ATT模型实现 22
4.1实验细节 22
4.2语料介绍 22
4.3 实验基本流程 23
4.4 文字预处理 23
4.5文本向量化 24
4.6模型设计 25
4.7评价标准 26
4.8实验结果 26
第五章 总结与展望 29
5.1总结 29
5.2展望 29
参考文献 31
致谢 34
第一章 绪论
1.1情感分析的研究背景
随着互联网的不断普及,微博、微信、豆瓣、贴吧等社交媒体的用户随之增多。越来越多的网民能够在社交媒体上发表自己的评论,表明自己的观点。就拿微博来说,自2009年8月新浪推出微博以来,微博的用户数量不断增长。从微博最新的2019年第四季度财报来看,2019年12月的月活跃用户数为5.16亿,较上年同期净增约5400万,日活跃用户数为2.22亿,较上年同期约增长了2200万活跃用户,而微信的月活跃用户更是超过了11亿。
请支付后下载全文,论文总字数:25067字