首页> 外文期刊>Computer Science & Information Technology >A Methodology for the Automatic Creation of Massive Continuous Query Datasets from Real Life Corpora
【24h】

A Methodology for the Automatic Creation of Massive Continuous Query Datasets from Real Life Corpora

机译:从真实生活语料库自动创建海量连续查询数据集的方法

获取原文
           

摘要

In the information filtering (or publish/ subscribe) paradigm, clients subscribe to a server withcontinuous queries that express their information needs while information sources publishdocuments to servers. Whenever a document is published, the continuous queries satisfying thisdocument are found and notifications are sent to appropriate subscribed clients. Althoughinformation filtering has been in the research agenda for about half a century, there is a hugeparadox when it comes to benchmarking the performance of such systems. There is a strikinglack of a benchmarking mechanism (in the form of a large-scale standarised test collection ofcontinuous queries and the relevant document publications) specifically created for evaluatingfiltering tasks. This work aims at filling this gap by proposing a methodology for automaticallycreating massive continuous query datasets from available document collections. We intend topublicly release all related material (including the software accompanying the proposedmethodology) to the research community after publication.
机译:在信息过滤(或发布/订阅)范式中,客户端使用连续查询来订阅服务器,这些查询表示信息需求,而信息源将文档发布到服务器。无论何时发布文档,都会找到满足该文档的连续查询,并将通知发送到适当的订阅客户端。尽管信息过滤已经存在了大约半个世纪,但在基准测试此类系统的性能时却存在巨大的悖论。存在专门为评估过滤任务而创建的基准测试机制(以连续查询的大规模标准测试集合和相关文档出版物的形式)。这项工作旨在通过提出一种从可用文档集中自动创建大量连续查询数据集的方法来填补这一空白。我们打算在出版后向研究社区公开发布所有相关材料(包括建议的方法随附的软件)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号