支持全文检索的的网络信息平台的开发毕业论文
2021-03-19 21:31:10
摘 要
Abstract VII
1绪论 1
1.1研究背景与意义 1
1.2国内外研究现状 2
1.3论文组织结构 3
2相关技术 4
2.1全文检索概述 4
2.2关于Lucene 4
2.2.1 Lucene简要介绍 4
2.2.2 Lucene和数据库 5
2.2.3 Lucene架构及相关工具包 5
2.3 Lucene中文分词器 7
2.4 Structs2框架 8
2.5本章内容总结 10
3系统分析 11
3.1系统简介 11
3.2系统需求 11
3.3本章内容总结 12
4系统设计 13
4.1系统框架与模块化 13
4.1.1总体框架 13
4.1.2功能结构设计 14
4.1.3系统流程 15
4.2系统详细设计 15
4.2.1文件处理 15
4.2.2索引构建 16
4.2.3中文分词 17
4.2.4搜索模块 17
4.3数据库设计 18
4.4本章内容总结 19
5系统实现 20
5.1硬件和软件 20
5.2核心功能实现 20
5.2.1 文件处理 20
5.2.2索引创建 20
5.2.4信息搜索 24
5.3数据库实现 25
5.4用户界面 27
5.4.1登录界面 27
5.4.2搜索界面 27
5.5本章内容总结 28
6总结与展望 29
6.1总结 29
6.2展望 29
致谢 31
参考文献 32
摘要
近年来随着及计算机技术与互联网技术的飞速发展,能够获取的信息量也越来越大。随之而来的是如何从日益增大的信息量中获取到自己需要的信息,为此,人类发明了很多方法,而在其中,搜索引擎是被应用的最广泛的技术,人们可以利用搜索引擎,输入关键字,来检索网页等信息,这大大增加了人们查找信息的速度。
然而搜索引擎技术也分为很多种类,全文检索是搜索引擎的核心技术之一。说到全文检索,自然会想到数据库检索。全文检索与数据库检索不一样的是处理的数据类型不一样,数据库主要针对的是结构化的数据,而全文检索则是可以检索结构化数据以及非结构化数据。所以,近年来全文检索技术因为其强大的功能所以发展越来越快,也被应用到网络的各个方面。
本论文首先介绍了全文检索的一些背景以及相关的技术,然后从系统分析以及设计,最后到具体实现来探讨如何构建一个支持全文检索的网络信息平台。项目类型是JavaWeb项目,采用的框架是Structs2,在Lucene的基础上进行进一步的开发,来完成支持中文的全文检索系统。该系统可以对word,pdf等格式的文档构建索引进而支持全文检索。同时支持用户登录以及上传存储资源和管理资源等操作。
本论文对于构建全文检索的重要部分均有具体的说明,涉及到核心部分,索引的创建,中文分词,文件处理以及搜索索引的模块采用了部分源码来具体说明一个重要类和方法的使用方法。采用更为直观的用户界面来为用户提供搜索服务。
最后运行全文检索系统,系统可以正常运行。对于系统的一些总结与更近一部的目标在总结与展望部分有具体的说明。
关键词:全文检索,Lucene,网络平台
Abstract
In recent years, with the rapid development of computer technology and Internet technology, the amount of information that can be acquired is also increasing. What happens is how to get the information you need from the increasing amount of information. To this end, humans have invented a lot of ways in which search engines are the most widely used technology, and people can use search Engine, enter the keyword, to retrieve information such as web pages, which greatly increases the speed at which people find information.
However, search engine technology is also divided into many types, full-text search is one of the core technology of search engines. Speaking of full-text search, will naturally think of the database search. Full-text search and database search is not the same as the data type is not the same, the database is mainly for the structured data, and full-text search is to retrieve structured data and unstructured data. Therefore, in recent years, full-text search technology because of its powerful function so the development of faster and faster, has also been applied to all aspects of the network.
This paper first introduces some backgrounds and related technologies of full-text retrieval, and then discusses how to construct a network information platform which supports full-text retrieval from system analysis and design, and finally to concrete realization. Project type is a JavaWeb project, using the framework is Structs2, Lucene based on the further development, to complete the support of the Chinese full-text retrieval system. The system can be word, pdf and other documents to build the index and then support the full text search. While supporting user login and upload storage resources and management resources and other operations.
In this paper, some important parts of the full-text search are described in detail. The modules related to the core part, index creation, Chinese word segmentation, file processing and search index are used to describe some important classes and methods. Using a more intuitive user interface to provide users with search services.