基于Python爬虫的特定商品信息评价分析系统的设计与实现毕业论文
2021-07-13 00:26:29
摘 要
二十一世纪,网络的迅速发展以及在各个领域广泛的运用。近些年来各式各样的电子商务网站如雨后春笋般快速出现,很多诸如B2B、C2C模式的购物网站的形成对于消费者来说有了更多的购物空间和选择,与此同时也慢慢改变了人们的购物习惯[1]。但是同时消费者不能再网上接触到实物,所以购物网站为广大消费者提供的商品信息评价系统就是一个很好的参考。对此国内外各大网站为消费者提供了各种评价参考系统,但是评价系统体现的更多是消费者的个人主观感想,并不能清楚的从大体上反映出商品对于买家的消费价值,这就使消费者在参考时存在比较大的局限,而且对消费者灌输错误的信息,造成不必要的损失。因此,许多知名电商企业转向与更专业的商品评价系统提供商合作,为的是向消费者提供科学的评价信息系统,确保自己的企业形象。另一方面,因为网路数据库是很庞大的,并且网路信息的不确定和时刻的变动让信息的管理很麻烦。因此促进了网络爬虫技术的兴起。[2]
本文主要介绍如何利用Python编写爬虫程序,对特定的商品信息进行评价分析。这个评价分析系统,旨在能够为有需求的人提供全面的商品信息,提供更好的参考价值。为了实现对特定商品信息的评价与分析系统,本文将此次项目划分为四个部分:(1)基于Python语言的多线程爬虫程序,主要作用是对指定的信息进行高效的抓取。(2)基于Python和Mysql的数据库后台,主要作用是存取抓取的信息。(3)基于Python语言的语义分析系统,主要作用是对数据库中的信息进行分析处理。(4)基于Python语言的数据展示系统,主要作用是直观的展示数据分析的结果。
本文对于爬虫技术的研究主要采用Python语言,所以预计使用Pycharm Mysql的方式进行软件项目的搭建,主要依赖于Python的各种相关库函数,在各个阶段会用到Python自带的Urllib库以及BeautifulSoup4这个外部库,并可能会用到pymysql之类的第三方库来进行Mysql数据库的读写操作,还有在数据展示界面,将会主要采用Python的Tkinter库进行GUI编程。
关键词:Python、爬虫技术、商品、评价
Abstract
In recent years, with the rapid development of the network and a wide application of network , various types of e-commerce sites have been rapid developed.A large number of B2B, C2C mode shopping webs provide consumers with a broader selection.In order to ensure the consumers to know the information of a certain product well,the related evaluation of goods has become an important basis for consumers to reference.Although the major domestic and foreign electronic malls provide consumers with a product evaluation system,evaluation system refers to consumers evaluation of goods and their personal shopping experience,which can’t properly reflect the objective value of goods for consumers overall.What’s worse,it results in a large limitations on the value of reference,also it will misguides consumers for unnecessary losses.Therefore,more and more electronic commercial enterprises are turning to cooperate with professional product evaluation system provider so as to provide consumers with a scientific evaluation system.The network databases are big,and online information resources are mostly unorganized and in a dynamic,which promote the rise of technology of web crawler.
In this paper, we use Python to write crawler program and to evaluate and analysis the specific goods information.As a system ,it is designed to provide comprehensive product information for customers in need.In order to achieve a specific commodity information assessment and analysis system,the paper divides the project into four parts.In the first part, multi-threaded Python language crawler based on Python,the crawler aims to catching the specific information efficiently. In the second part, database background based on Python and Mysql aims to depositing and withdrawing the obtained information.In the third part, the semantic analysis system aims to analysising the information of database.In the last part, the data display system based on Python aims to showing the result of data analysis directly.
In this paper, the study of crawler technology is mainly Python,so the software mainly depends on a variety of related Python library functions. In each stage, Python Urllib libraries and BeautifulSoup4 library will be used.What’ more,it is likely to use other library to deposit and withdraw Mysql library such as pymysql.And the GUI programming will mainly by Tkinter library of Python in the data display interface.
Key words: Python,Crawler technology,goods,evaluation
目录
摘 要...................................................................................................................................
Abstract...............................................................................................................................
第一章 绪论.........................................................................................................................1
1.1本文研究的背景以及意义...................................................................................1
1.2 本文的结构安排...................................................................................................2
第二章 Python爬虫算法技术......................................................................................3
2.1 爬虫简介...............................................................................................................3
2.2 爬虫的3个搜索策略.......................................................................................4
2.3 爬虫的分类...........................................................................................................4
2.4 Python简介...........................................................................................................5
2.5 Python的主要技术特征......................................................................................6
第三章 爬虫算法在的特定商品信息评价分析系统实现...................................7
3.1 主题爬虫对于网页的抓取................................................................................7
3.2爬虫对于网页的分析方法..................................................................................8
3.3 基于Python爬虫对于网页内容的分析方法...............................................8
3.4 Python数据库管理与操作................................................................................11
3.5 商品评价分析系统的实现................................................................................13
3.6 商品展示系统的实现..........................................................................................16
第四章 系统测试……………………………………………………………………………………………17
第五章 总结与展望………………………………………………………………………………………...25