首页> 外文会议>ACM/IEEE-CS joint conference on digital libraries >Access Patterns for Robots and Humans in Web Archives
【24h】

Access Patterns for Robots and Humans in Web Archives

机译:Web档案中机器人和人类的访问模式

获取原文

摘要

Although user access patterns on the live web are well-understood, there has been no corresponding study of how users, both humans and robots, access web archives. Based on samples from the Internet Archive's public Wayback Machine, we propose a set of basic usage patterns: Dip (a single access), Slide (the same page at different archive times), Dive (different pages at approximately the same archive time), and Skim (lists of what pages are archived, i.e., Time-Maps). Robots are limited almost exclusively to Dips and Skims, but human accesses are more varied between all four types. Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1 in terms of megabytes transferred. Robots almost always access Time-Maps (95% of accesses), but humans predominately access the archived web pages themselves (82% of accesses). In terms of unique archived web pages, there is no overall preference for a particular time, but the recent past (within the last year) shows significant repeat accesses.
机译:尽管实时网络上的用户访问模式已被很好地理解,但是还没有有关人类和机器人用户如何访问网络档案的相应研究。基于Internet存档的公共Wayback Machine的样本,我们提出了一组基本的使用模式:Dip(单次访问),Slide(在不同存档时间为同一页面),Dive(在大致相同存档时间为不同页面),和“略读”(已归档哪些页面的列表,即时间映射)。机器人几乎仅限于Dips和Skims,但在这四种类型中,人类的出入都更加多样化。就会话而言,机器人的数量比人类多10:1,根据原始HTTP访问的数量比人类多5:4,就传输的兆字节而言,数量比人类多4:1。机器人几乎总是访问时间地图(占访问的95%),但是人类主要访问本身的存档网页(占访问的82%)。就独特的存档网页而言,在特定时间没有整体偏好,但最近的过去(在去年内)显示出大量重复访问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号