...
首页> 外文期刊>International Journal of Computer Network and Information Security >A Full-text Website Search Engine Powered by Lucene and The Depth First Search Algorithm
【24h】

A Full-text Website Search Engine Powered by Lucene and The Depth First Search Algorithm

机译:由Lucene提供支持的全文本网站搜索引擎和深度优先搜索算法

获取原文
           

摘要

With the amount of available text data on the web growing rapidly, the need for users to search such information is dramatically increasing. Full text search engines and relational databases each have unique strengths as development tools but also have overlapping capabilities. Both can provide for storage and update of data and both support search of the data. Full text systems are better for quickly searching high volumes of unstructured text for the presence of any word or combination of words. They provide rich text search capabilities and sophisticated relevancy ranking tools for ordering results based on how well they match a potentially fuzzy search request. Relational databases, on the other hand, excel at storing and manipulating structured data -- records of fields of specific types (text, integer, currency, etc.). They can do so with little or no redundancy. They support flexible search of multiple record types for specific values of fields, as well strong tools for quickly and securely updating individual records. The web being a collection of largely unstructured document which is ever growing in size, the appeal of using RDBMS for searching this collection of documents has become very costly. This paper describes the architecture, design and implementation of a prototype website search engine powered by Lucene to search through any website. This approach involves the development of a small scale web crawler to gather information from the desired website. The gathered information are then converted to a Lucene document and stored in the index. The time taken to search the index is very short when compared with how long it takes for a relational database to process a query.
机译:随着网络上可用文本数据量的迅速增长,用户搜索此类信息的需求急剧增加。全文搜索引擎和关系数据库分别具有作为开发工具的独特优势,但又具有重叠的功能。两者都可以提供数据的存储和更新,并且都支持数据搜索。全文系统更好地用于快速搜索大量的非结构化文本,以查找是否存在任何单词或单词组合。它们提供了丰富的文本搜索功能和完善的相关性排名工具,可根据它们与潜在的模糊搜索请求的匹配程度来对结果进行排序。另一方面,关系数据库擅长存储和处理结构化数据-特定类型(文本,整数,货币等)字段的记录。他们可以做到很少或没有冗余。它们支持灵活搜索多种记录类型以获取特定字段值,以及强大的工具来快速安全地更新单个记录。 Web是一个大型的,非结构化文档的集合,并且规模越来越大,使用RDBMS搜索该文档集合的吸引力变得非常昂贵。本文介绍了由Lucene支持的可通过任何网站进行搜索的原型网站搜索引擎的体系结构,设计和实现。此方法涉及开发小型Web爬网程序以从所需网站收集信息。然后将收集的信息转换为Lucene文档并存储在索引中。与关系数据库处理查询所花费的时间相比,搜索索引所花费的时间非常短。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号