
  • 登录
  • 忘记密码?点击找回


  • 获取手机验证码 60
  • 注册


  • 获取手机验证码60
  • 找回
毕业论文网 > 毕业论文 > 电子信息类 > 电子信息工程 > 正文


 2021-03-14 21:42:53  

摘 要





With the development of Internet technology, human social interaction is more efficient. Users can access a lot of data every day, these data are found from various fields and daily life, then people can store these data in the network cloud or mobile hard disk. Huge data is produced every day, and humans have always looked at the content of interest in these exploding-like data. With the upgrading of network technology, users store a lot of information in the network, and the use of the value of the information extraction is the challenge. Users expect from the Internet to more content, which leads to the network crawler technology. Through the web crawler technology, users can view the content of the content on the page needs to be stored in their own equipment.

The rise of social networks has attracted more and more users. People through the social network to participate in the discussion of news information, thus the social network data generation is huge. In China, sina twitter in line with the user to share the requirements of the exchange, and finally formed such a platform. Users of the discussion of a matter to form a topic, sina twitter just to provide such a mechanism to form a large number of topics. It can be said that the hot topic list has become the point of the network headlines, by analyzing the topic list we can know the user interested in the topic. This article based on the network reptile technology call sina twitter API and simulation landing sina twitter crawl data, through the Python language coding from the social network platform sina twitter extract special boring topic list and participate in the discussion of the user information, the use of data mining in the poly Class algorithm to analyze the data, get the results of similar topics, this result reflects the user interested in the topic or recommend similar topics to the user, so that sina twitter topic features better for people to use.

Key Words:web crawler;social networks;simulated landing;Data mining;cluster analysis

目 录

第1章 绪 论 1

1.1 社交网络数据分析的目的及意义 1

1.2 社交网络数据分析的国内外发展及研究现状 2

1.3 本文研究内容 3

第2章 网络爬虫与数据挖掘 5

2.1 网络爬虫 5

2.1.1 通用网络爬虫 5

2.1.2 聚焦网络爬虫 6

2.1.3 爬虫过程中存在的问题 8

2.2 网页的数据挖掘 8

2.2.1 数据挖掘功能 8

2.2.2 数据挖掘的聚类分析 9

2.2.3 凝聚层次聚类 10

2.3 本章小结 11

第3章 新浪话题数据的提取 13

3.1 Python语言 13

3.2 新浪的API接口 14

3.2.1 申请API接口 14

3.2.2 API接口的调用 15

3.3 新浪微博模拟登陆 18

3.3.1 分析登陆过程 18

3.3.2 模拟登陆 19

3.4 数据提取 21

3.4.1 静态页面数据提取 21

3.3.2 动态页面数据提取 24

3.5 本章小结 24

第4章 基于社交网络的数据分析 25

4.1 网络爬虫数据提取 25

4.2 数据分析处理 26

4.3 本章小结 29

第5章 结论与展望 30

5.1 论文工作总结 30

5.2 进一步研究展望 31

参考文献 32

致 谢 33

第1章 绪 论

1.1 社交网络数据分析的目的及意义



您需要先支付 50元 才能查看全部内容!立即支付


Copyright © 2010-2022 毕业论文网 站点地图