首页>
外国专利>
Unsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages
Unsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages
展开▼
机译:无监督的自动Web主机动态检测,死链接检测和搜索索引网页的必备页面发现
展开▼
页面导航
摘要
著录项
相似文献
摘要
Automated crawling of page links associated with a site domain that was previously crawled involves computing the dynamicity of a site based on totals of continuous dead links, live links and/or prerequisite pages encountered while crawling page links corresponding to the site. The degree to which links are crawled is optimized based on the dynamicity of the site. Some pages require that another particular page (i.e., a prerequisite page) is retrieved from the host prior to retrieving a given page, e.g., so that the prerequisite page can set a cookie. Prerequisite pages are determined based on stored information about pages that were retrieved, during a previous crawl, prior to retrieving a page. Prerequisite pages are identified to a search system so that when a user clicks on the URL for the page, the request is redirected to the prerequisite page to set the cookie appropriately.
展开▼