一种基于NLP的传感器选型咨询系统设计毕业论文
2021-11-05 19:38:04
摘 要
在产品种类数据量爆炸增长的时代背景下,人们对新的搜索方式提出了需求。本文设计了基于自然语言处理技术的传感器选型咨询系统,基于知识图谱形式的传感器领域知识库,运用自然语言处理技术分析用户的查询意图,根据用户的需求,如传感器型号、某项参数指标等,返回用户搜索的最佳答案。
本文完成了基于自然语言处理技术的传感器选型咨询系统的设计与实现,主要工作包括以下内容:
(1)从互联网上采集传感器相关数据与信息,并整理成相关数据表格,依靠neo4j数据库,实现了小规模传感器知识图谱的搭建;收集并制作传感器领域相关文本语料,为神经网络模型的训练提供基础。
(2)研究了神经网络模型在自然语言处理领域的应用,包括研究CRF层与word2vec词向量化对BiLSTM神经网络模型进行命名实体识别任务时识别效果的影响,最终BiLSTM、BiLSTM-CRF、word2vec-BiLSTM-CRF三种模型在测试集上测试的F1值分别是64.61%、97.71%、94.38%。此外,还研究CNN神经网络与BiLSTM神经网络各自在文本分类任务上的分类效果,训练完成后在测试集上测试,两个模型都对各类任务有着90%以上的识别准确率,但BiLSTM模型的效果要略优于CNN模型。最终使用BiLSTM-CRF模型完成命名实体识别任务,使用BiLSTM模型完成文本分类任务。
(3)运用基于规则的槽填充方法,实现简易的上下文关联以及问题推荐功能。可以应用于多次提问场景、也可以在系统无法清晰理解用户查询意图时返回给用户一定量的推荐问题或提示。
(4)使用HTML5语言编写Web前端页面,作为客户端,实现人机交互页面功能。并以python作为服务端,集合其余所有模块的功能。通过websocket协议进行握手通信,实现前后端信息的互通。实现了用户从Web页面输入问句,送入服务端进行处理,再以自然语言的形式发送到Web页面返回给用户。在100次模拟多次问答之中,有着60%的“完全有用”评分,32%的“部分有用”评分以及3%的“完全无用”评分,达到了较好的效果。
关键词:自然语言处理技术;知识图谱;问答系统;传感器选型
Abstract
In the era of explosive growth of sensor data, people put forward the demand for new search methods. In this paper, a sensor selection consultation system based on natural language processing technology is designed. It uses natural language processing technology to understand the user's query intention, and builds a domain knowledge base in the form of knowledge map. According to the user's needs, such as sensor model, a parameter index, it returns the best answer of the user's searching.
The design and implementation of sensor selection consultation system based on natural language processing technology is completed in this project. The main work of this project includes the following contents:
- Collect sensor related data and information from the Internet, organize them into related data tables, and build a small-scale sensor knowledge map based on the neo4j database; collect and make relevant text corpus in the sensor field to provide the basis for neural network model training.
- The application of neural network model in natural language processing is studied, including the influence of CRF layer and word2vec word vectorization on the recognition effect of named entity recognition task of BiLSTM neural network model. Finally, the F1 values of BiLSTM, BiLSTM CRF and word2vec BiLSTM CRF are 64.61%, 97.71% and 94.38% respectively. In addition, studying the classification effect of CNN neural network and BiLSTM neural network on text classification tasks. After training, we test them on the test set. Both models have more than 90% recognition accuracy for various tasks, but the effect of BiLSTM model is slightly better than CNN model. Finally, using the BiLSTM-CRF model to complete the task of named entity recognition, and use the BiLSTM model to complete the task of text classification.
- Using rule-based slot filling method to realize simple context correlation and problem recommendation.
- Using HTML5 language to write web front-end page, as the client, to realize the function of human-computer interaction page. And take Python as the server to collect the functions of all other modules. Through the handshake communication of websocket protocol, the front and back end information can be exchanged. It realizes that users input questions from web pages, send them to the server for processing, and then send them to web pages in the form of natural language to return to users. Among the 100 simulated multiple questions and answers, 60% of them were "completely useful", 32% of them were "partially useful" and 3% of them were "completely useless", which achieved good results.
Key words:Natural language processing technology; knowledge map; question answering system; Sensor selection
目 录
第1章 绪论 1
1.1选题背景及意义 1
1.2问答系统研究现状 1
1.3论文组织框架 2
第2章 传感器选型咨询系统总体方案设计 4
2.1系统需求与功能分析 4
2.2技术路线与选择 4
2.2.1自然语言处理技术的选择 5
2.2.2知识库技术的选择 6
2.3系统总体框架设计 7
第3章 传感器选型咨询系统设计 9
3.1数据采集模块设计 9
3.1.1传感器数据的采集表格设计 9
3.1.2问题分类模型训练语料的制作程序设计 10
3.2数据存储模块设计 11
3.2.1neo4j数据库表结构设计 12
3.2.2excel导入neo4j数据库程序设计 12
3.3问题分析模块设计 13
3.3.1命名实体识别功能设计 13
3.3.2问题分类功能设计 19
3.3.3上下文关联功能与问题推荐功能设计 22
3.4前端交互模块设计 25
第4章 传感器选型咨询系统实现与测试 27
4.1数据采集模块的实现与结果 27
4.1.1传感器数据的采集 27
4.1.2训练语料的制作 28
4.1.3问题分类模型训练语料生成程序的实现与结果 28
4.2数据存储模块的实现与结果 29
4.3问题分析模块的实现与结果 31
4.4系统测试与分析 39
第5章 总结 37
5.1全文总结 42
5.2不足与展望 42
参考文献 44
致谢 46
第1章 绪论
本章为绪论部分,主要介绍本文的选题背景及意义、问答系统研究现状以及本文的组织框架。选题背景及意义部分阐述了选题原因以及最终的目的;问答系统研究现状部分以其答案来源为标准,将问答系统分成了三大类并逐一进行阐述;最后,通过论文的组织结构部分梳理了本文的行文框架。
1.1选题背景及意义