首页> 外文会议>Proceedings of the 2011 ACM/IEEE on joint conference on digital libraries. >Archiving the Web using Page Changes Patterns: A Case Study
【24h】

Archiving the Web using Page Changes Patterns: A Case Study

机译:使用页面更改模式归档Web:一个案例研究

获取原文
获取原文并翻译 | 示例

摘要

A pattern is a model or a template used to summarize and describe the behavior (or the trend) of a data having generally some recurrent events. Patterns have received a considerable attention in recent years and were widely studied in the data mining field. Various pattern mining approaches have been proposed and used for different applications such as network monitoring, moving object tracking, financial or medical data analysis, scientific data processing, etc. In these different contexts, discovered patterns were useful to detect anomalies, to predict data behavior (or trend), or more generally, to simplify data processing or to improve system performance. However, to the best of our knowledge, patterns have never been used in- the context of web archiving. Web archiving is the process of continuously collecting and preserving portions of the World Wide Web for future generations. In this paper, we show how patterns of page changes can be useful tools to efficiently archive web sites. We first define our pattern model that describes the changes of pages. Then, we present the strategy used to (i) extract the temporal evolution of page changes, to (ii) discover patterns and to (Hi) exploit them to improve web archives. We choose the archive of French public TV channels France. Televisions as a case study1 in order to validate our approach. Our experimental evaluation based on real web pages shows the utility of patterns to improve archive quality and to optimize indexing or storing.
机译:模式是用于概括和描述通常具有一些重复事件的数据的行为(或趋势)的模型或模板。近年来,模式已受到相当大的关注,并已在数据挖掘领域进行了广泛的研究。已经提出了各种模式挖掘方法,并将其用于不同的应用程序,例如网络监视,移动对象跟踪,财务或医疗数据分析,科学数据处理等。在这些不同的上下文中,发现的模式对于检测异常,预测数据行为非常有用。 (或趋势),或更笼统地说,是为了简化数据处理或提高系统性能。但是,据我们所知,从未在Web归档的上下文中使用过模式。 Web归档是为子孙后代不断收集和保留部分万维网的过程。在本文中,我们展示了页面更改模式如何成为有效归档网站的有用工具。我们首先定义描述页面变化的模式模型。然后,我们提出了用于(i)提取页面更改的时间演变,(ii)发现模式并(hi)利用它们来改进Web存档的策略。我们选择法国公共电视台法国的档案。电视作为案例研究1,以验证我们的方法。我们基于真实网页的实验评估显示了模式的实用性,可以提高归档质量并优化索引或存储。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号