
  • 登录
  • 忘记密码?点击找回


  • 获取手机验证码 60
  • 注册


  • 获取手机验证码60
  • 找回
毕业论文网 > 毕业论文 > 计算机类 > 计算机科学与技术 > 正文


 2022-05-28 22:53:58  


摘 要

舆情本身是民意理论中的一个概念,它是民意的一种综合反映 。网络舆情是社会舆情在互联网空间的映射,是社会舆情的直接反映。传统的社会舆情存在于 民间,存在于大众的思想观念和日常的街头巷尾的议论之中,前者难以捕捉,后者稍纵即逝,舆情的获取只能通过社会明察暗访、民意调查等方式进行,获取效率低下,样本少而且容易流于偏颇,耗费巨大。而随着互联网的发展,大众往往以信息化的方式发表各自看法,网络舆情可以采用网络自动抓取等技术手段方便获取,效率高而且信息保真(没有人为加工),覆盖面全。



关键词:舆情分析 热点 信息


Public opinion is a concept theory that comprehensive reflection of public opinion. Public opinion is the social network mapping the Internet space and a direct reflection of social public opinion. Traditional public opinion exists in society, public opinion can only get through social by investigation, opinion ways to get inefficient, small sample size and easy flow to biased, costly. With the development of the Internet, the public often express their views in the way of information, the network can use the network to automatically fetch public opinion and other technical means easy access, high efficiency and fidelity information (no artificial processing), the coverage of the whole.

The system uses the Java language to implement a public opinion based on Sina Weibo. Weibo has become inseparable from daily life most users information exchange platform, therefore, this public opinion analysis system mainly to catch people on daily life .The public release of information capture, screening, induction to analyze the topic of trends in society and people's livelihood hot.

This paper describes the background and significance of the subject of the system, also introduced the objectives of the system implementation. Myeclipse system as a development tool. Web crawler, Word segmentation, word frequency are three parts of this system. Web crawler to crawl depth-first strategy capture micro-blog content information, contains a simple filtering operation. Word segmentation calls Ansj word, to achieve words into word with part of speech tagging. Frequency of word filter implemented according to part of speech, and word frequency statistics appear out of the last word frequency high extraction as keywords. Taking into consideration that information is real-time catch,so no need to use database to maintain information, without regard to the security of information. Accessing to information in a text document txt perform is more convenient.

Keywords:public opinion analysis; central issue; Information

目 录

摘 要 I

Abstract II

目 录 III

第一章 绪论 1

1.1课题的研究背景 1

1.2课题的研究目的 1

1.3舆情分析系统功能及技术介绍 1

1.4各章重点 3

第二章 网页信息获取 4

2.1爬虫技术简介 4

2.2网页搜索策略(爬虫抓取策略) 5

2.3系统中包的交互关系 5

2.4微博信息获取流程 7

2.3爬虫数据处理 8

第三章 自然语言处理与网页分析 9

3.1自然语言及其处理技术简介 9

3.2舆情分析系统的数据分词流程 10

3.3词性的标注 11

3.4分词数据处理 13

第四章 词频统计 14

4.1词频统计算法介绍 14

4.2词的统计与过滤 16

4.3词频数据处理 17

第五章 系统设计和实现 19

5.1爬虫抓取数据的实现 19

5.2数据分词的实现 21

5.3 词频统计的实现 22

5.4 实验数据 25

结束语 29

参考文献 31










您需要先支付 50元 才能查看全部内容!立即支付


Copyright © 2010-2022 毕业论文网 站点地图