首页> 外文会议>International conference on recent developments in science, engineering and technology >URL-Based Relevance-Ranking Approach to Facilitate Domain-Specific Crawling and Searching
【24h】

URL-Based Relevance-Ranking Approach to Facilitate Domain-Specific Crawling and Searching

机译:基于URL的相关性排名方法,以促进特定于域的爬行和搜索

获取原文

摘要

The WWW is a vast repository of all the types of information known to mankind and thus is capable of serving the frequent varying needs of its users. Classifying and organizing the webpages according to their domain or topic will help the search engine in retrieving and returning a set of fairly relevant pages to the users. This classification is generally done on the basis of their underlying text or content. This paper brings in a novel approach that tries to predict the relevance of a webpage in a domain not by downloading its content but based on the web documents it is linked to. The approach offers advantages of efficiency in cost and performance as the most easily and the least expensive information available about a webpage is its uniform resource locator (URL) [1]. Since the URLs serve as the unique identifier, they are assumed to be an important source for the content of a web page, and therefore, the proposed approach associates the domain information with the web pages based on their URLs.
机译:WWW是一个庞大的存储库,其所有类型的信息类型都知道,因此能够为其用户提供频繁的不同需求。根据其域或主题对网页进行分类和组织,将帮助搜索引擎检索和返回一组相当相关的页面给用户。此分类通常是基于其底层文本或内容进行的。本文带来了一种新的方法,它试图通过下载其内容而不是下载它链接到的Web文档来预测网页的相关性。该方法提供成本和性能效率的优点,以及最容易的昂贵信息,可提供关于网页的最低信息是其统一的资源定位器(URL)[1]。由于URL用作唯一标识符,因此假设它们是网页内容的重要来源,因此,所提出的方法基于其URL将域信息与网页相关联。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号