首页> 外国专利> UNSUPERVISED, AUTOMATED WEB HOST DYNAMICITY DETECTION, DEADLINK DETECTION AND PREREQUISITE PAGE DISCOVERY FOR SEARCH INDEXED WEB PAGES

UNSUPERVISED, AUTOMATED WEB HOST DYNAMICITY DETECTION, DEADLINK DETECTION AND PREREQUISITE PAGE DISCOVERY FOR SEARCH INDEXED WEB PAGES

机译:无需监控的自动Web主机动态检测,死链接检测和前提条件,即可搜索独立的Web页面

摘要

Automated crawling of page links associated with a site domain that was previously crawled involves computing the dynamically of a site based on totals of continuous dead links, live links and/or prerequisite pages encountered while crawling page links corresponding to the site. The degree to which links are crawled is optimized based on the dynamically of the site. Some pages require that another particular page (i.e., a prerequisite page) is retrieved from the host prior to retrieving a given page, e.g., so that the prerequisite page can set a cookie. Prerequisite pages are determined based on stored information about pages that were retrieved, during a previous crawl, prior to retrieving a page. Prerequisite pages are identified to a search system so that when a user clicks on the URL for the page, the request is redirected to the prerequisite page to set the cookie appropriately.
机译:自动爬网与先前爬网的站点域关联的页面链接涉及基于在爬网与该站点对应的页面链接时遇到的连续无效链接,活动链接和/或必备页面的总数来动态计算站点。链接的爬网程度根据站点的动态情况进行优化。某些页面要求在检索给定页面之前从主机检索另一个特定页面(即,先决条件页面),以便先决条件页面可以设置cookie。前提条件页面是根据存储的有关页面的信息确定的,这些信息是在上一次爬网期间检索页面之前检索到的页面的。先决条件页面被标识到搜索系统,以便当用户单击页面的URL时,请求将重定向到先决条件页面以适当地设置cookie。

著录项

  • 公开/公告号IN2005KO00560A

    专利类型

  • 公开/公告日2007-07-13

    原文格式PDF

  • 申请/专利权人

    申请/专利号IN00560/KOL/2005

  • 申请日2005-06-28

  • 分类号G06F17/30;

  • 国家 IN

  • 入库时间 2022-08-21 20:58:09

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号