URL-Based Relevance-Ranking Approach to Facilitate Domain-Specific Crawling and Searching

机译：基于URL的相关性排名方法，以促进特定于域的爬行和搜索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The WWW is a vast repository of all the types of information known to mankind and thus is capable of serving the frequent varying needs of its users. Classifying and organizing the webpages according to their domain or topic will help the search engine in retrieving and returning a set of fairly relevant pages to the users. This classification is generally done on the basis of their underlying text or content. This paper brings in a novel approach that tries to predict the relevance of a webpage in a domain not by downloading its content but based on the web documents it is linked to. The approach offers advantages of efficiency in cost and performance as the most easily and the least expensive information available about a webpage is its uniform resource locator (URL) [1]. Since the URLs serve as the unique identifier, they are assumed to be an important source for the content of a web page, and therefore, the proposed approach associates the domain information with the web pages based on their URLs.

机译：WWW是一个庞大的存储库，其所有类型的信息类型都知道，因此能够为其用户提供频繁的不同需求。根据其域或主题对网页进行分类和组织，将帮助搜索引擎检索和返回一组相当相关的页面给用户。此分类通常是基于其底层文本或内容进行的。本文带来了一种新的方法，它试图通过下载其内容而不是下载它链接到的Web文档来预测网页的相关性。该方法提供成本和性能效率的优点，以及最容易的昂贵信息，可提供关于网页的最低信息是其统一的资源定位器（URL）[1]。由于URL用作唯一标识符，因此假设它们是网页内容的重要来源，因此，所提出的方法基于其URL将域信息与网页相关联。

著录项

来源
《International conference on recent developments in science, engineering and technology》|2018年|xi 307 p.|共12页
会议地点
作者
Sonali Gupta; Komal Kumar Bhatia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
URL; Domain identification; Topic-specific Web page classification; Crawler; Search engine; Domain-specific Focused crawler; Hidden-web crawler;

机译：URL;域名识别;专题特定的网页分类;履带;搜索引擎;特定于域特定的聚焦履带;隐藏网爬虫;

相似文献

外文文献
中文文献
专利

1. Heuristic-based strategy for Phishing prediction: A survey of URL-based approach [J] . Revoredo da Silva Carlo Marcelo, Feitosa Eduardo Luzeiro, Garcia Vinicius Cardoso Computers & Security . 2020,第Jana期

机译：基于启发式的网络钓鱼预测策略：基于URL的方法的调查
2. A Domain-Specific Concept-Based Searching System [J] . Tru H. Cao, Mai T. H. Ta, Tung Q. Luong 電子情報通信学会技術研究報告. 人工知能と知識処理. Artificial Intelligence and Knowledge Based Processing . 2004,第488期

机译：基于特定领域概念的搜索系统
3. Crawling Strategies of Reverse Searching and Incremental Two-Level Site Prioritizing System. [J] . Chinmai Daka1, Julie Shabna S1 Research Journal of Pharmaceutical, Biological and Chemical Sciences . 2016,第4期

机译：反向搜索和增量式两级站点优先系统的爬行策略。
4. URL-Based Relevance-Ranking Approach to Facilitate Domain-Specific Crawling and Searching [C] . Sonali Gupta, Komal Kumar Bhatia International conference on recent developments in science, engineering and technology . 2018

机译：基于URL的相关性排名方法，以促进特定于域的爬行和搜索
5. A novel hybrid focused crawling algorithm to build domain-specific collections. [D] . Chen, Yuxin. 2007

机译：一种新颖的混合重点爬网算法，用于构建特定于域的集合。
6. N-Glycans on EGF domain-specific O-GlcNAc transferase (EOGT) facilitate EOGT maturation and peripheral endoplasmic reticulum localization [O] . Sayad Md. Didarul Alam, Yohei Tsukamoto, Mitsutaka Ogawa, 2020

机译：在EGF结构域特异性O-GlcNAc转移酶（EoGT）上的N-聚糖促进Eogt成熟和外周内质网本地化
7. DSphere: A Source-Centric Approach to Crawling, Indexing and Searching the World Wide Web [O] . Bhuvan Bamba, Ling Liu, James Caverlee, 2013

机译：Dsphere：以源为中心的方法来对万维网进行爬行，索引和搜索

URL-Based Relevance-Ranking Approach to Facilitate Domain-Specific Crawling and Searching

摘要

著录项

相似文献

相关主题

期刊订阅