Robots Exclusion and Guidance Protocol

Dajie Ge; Zhijun Ding

首页> 中文期刊> 《清华大学学报（英文版）》 >Robots Exclusion and Guidance Protocol

Robots Exclusion and Guidance Protocol

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

With the rapid development of the Internet,general-purpose web crawlers have increasingly become unable to meet people's individual needs as they are no longer efficient enough to fetch deep web pages.The presence of several deep web pages in the websites and the widespread use of Ajax make it difficult for general-purpose web crawlers to fetch information quickly and efficiently.On the basis of the original Robots Exclusion Protocol (REP),a Robots Exclusion and Guidance Protocol (REGP) is proposed in this paper,by integrating the independent scattered expansions of the original Robots Protocol developed by major search engine companies.Our protocol expands the file format and command set of the REP as well as two labels of the Sitemap Protocol.Through our protocol,websites can express their aspects of requirements for restrictions and guidance to the visiting crawlers,and provide a general-purpose fast access of deep web pages and Ajax pages for the crawlers,and facilitates crawlers to easily obtain the open data on websites effectively with ease.Finally,this paper presents a specific application scenario,in which both a website and a crawler work with support from our protocol.A series of experiments are also conducted to demonstrate the efficiency of the proposed protocol.

著录项

来源
《清华大学学报（英文版）》 |2016年第6期|643-659|共17页
作者
Dajie Ge; Zhijun Ding;
展开▼
作者单位

Department of Computer Science and Technology, Tongji University,Shanghai 201804, China;

Department of Computer Science and Technology, Tongji University,Shanghai 201804, China;

展开▼
原文格式 PDF
正文语种 eng
中图分类
关键词

Robots Exclusion and Guidance Protocol

摘要

著录项

相关主题

期刊订阅