A Full-text Website Search Engine Powered by Lucene and The Depth First Search Algorithm

Modinat. A. Mabayoje; O. S. Oni; Olawale S. Adebayo

首页> 外文期刊>International Journal of Computer Network and Information Security >A Full-text Website Search Engine Powered by Lucene and The Depth First Search Algorithm

【24h】

A Full-text Website Search Engine Powered by Lucene and The Depth First Search Algorithm

机译：由Lucene提供支持的全文本网站搜索引擎和深度优先搜索算法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the amount of available text data on the web growing rapidly, the need for users to search such information is dramatically increasing. Full text search engines and relational databases each have unique strengths as development tools but also have overlapping capabilities. Both can provide for storage and update of data and both support search of the data. Full text systems are better for quickly searching high volumes of unstructured text for the presence of any word or combination of words. They provide rich text search capabilities and sophisticated relevancy ranking tools for ordering results based on how well they match a potentially fuzzy search request. Relational databases, on the other hand, excel at storing and manipulating structured data -- records of fields of specific types (text, integer, currency, etc.). They can do so with little or no redundancy. They support flexible search of multiple record types for specific values of fields, as well strong tools for quickly and securely updating individual records. The web being a collection of largely unstructured document which is ever growing in size, the appeal of using RDBMS for searching this collection of documents has become very costly. This paper describes the architecture, design and implementation of a prototype website search engine powered by Lucene to search through any website. This approach involves the development of a small scale web crawler to gather information from the desired website. The gathered information are then converted to a Lucene document and stored in the index. The time taken to search the index is very short when compared with how long it takes for a relational database to process a query.

机译：随着网络上可用文本数据量的迅速增长，用户搜索此类信息的需求急剧增加。全文搜索引擎和关系数据库分别具有作为开发工具的独特优势，但又具有重叠的功能。两者都可以提供数据的存储和更新，并且都支持数据搜索。全文系统更好地用于快速搜索大量的非结构化文本，以查找是否存在任何单词或单词组合。它们提供了丰富的文本搜索功能和完善的相关性排名工具，可根据它们与潜在的模糊搜索请求的匹配程度来对结果进行排序。另一方面，关系数据库擅长存储和处理结构化数据-特定类型（文本，整数，货币等）字段的记录。他们可以做到很少或没有冗余。它们支持灵活搜索多种记录类型以获取特定字段值，以及强大的工具来快速安全地更新单个记录。 Web是一个大型的，非结构化文档的集合，并且规模越来越大，使用RDBMS搜索该文档集合的吸引力变得非常昂贵。本文介绍了由Lucene支持的可通过任何网站进行搜索的原型网站搜索引擎的体系结构，设计和实现。此方法涉及开发小型Web爬网程序以从所需网站收集信息。然后将收集的信息转换为Lucene文档并存储在索引中。与关系数据库处理查询所花费的时间相比，搜索索引所花费的时间非常短。

著录项

来源
《International Journal of Computer Network and Information Security》 |2013年第3期|共页
作者
Modinat. A. Mabayoje; O. S. Oni; Olawale S. Adebayo;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Construction and usage of full-text search engine system -How to create your web site with full-text search engine system- [J] . Harada Yoichi 情報管理 . 2001,第10期

机译：全文搜索引擎系统的构建和使用-如何使用全文搜索引擎系统创建您的网站-
2. Development of Search Engines using Lucene: An Experience [J] . Masnizah Mohd Procedia - Social and Behavioral Sciences . 2011,第2期

机译：使用Lucene开发搜索引擎的经验
3. The Best Possible Search/Optimizing Your Website for Search Engines [J] . Scott Orth US Glass, Metal & Glazing . 2008,第2期

机译：最好的搜索/针对搜索引擎优化您的网站
4. Research on Lucene Based Full-Text Query Search Service for Smart Distribution System [C] . Zheng Youzhuo, Fu Yu, Zhang Ruifeng, International Conference on Artificial Intelligence and Big Data . 2020

机译：基于Lucene的智能配电系统全文查询服务研究。
5. Proactive search: Using outcome-based dynamic nearest-neighbor recommendation algorithms to improve search engine efficacy. [D] . Wagner, Christopher Shaun. 2014

机译：主动搜索：使用基于结果的动态最近邻居推荐算法来提高搜索引擎的效率。
6. Improving a full-text search engine: the importance of negation detection and family history context to identify cases in a biomedical data warehouse [O] . Nicolas Garcelon, Antoine Neuraz, Vincent Benoit, 2017

机译：完善全文搜索引擎：否定检测和家族历史背景的重要性以识别生物医学数据仓库中的案例
7. Fast and Exact Nearest Neighbor Search in Hamming Space on Full-Text Search Engines [O] . Cun (Matthew) Mu, Jun (Raymond) Zhao, Guang Yang, 2019

机译：在全文搜索引擎上的汉明空间中的快速和精确最近的邻居搜索

A Full-text Website Search Engine Powered by Lucene and The Depth First Search Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅