A Methodology for the Automatic Creation of Massive Continuous Query Datasets from Real Life Corpora

Christos Tryfonopoulos

首页> 外文期刊>Computer Science & Information Technology >A Methodology for the Automatic Creation of Massive Continuous Query Datasets from Real Life Corpora

【24h】

A Methodology for the Automatic Creation of Massive Continuous Query Datasets from Real Life Corpora

机译：从真实生活语料库自动创建海量连续查询数据集的方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the information filtering (or publish/ subscribe) paradigm, clients subscribe to a server withcontinuous queries that express their information needs while information sources publishdocuments to servers. Whenever a document is published, the continuous queries satisfying thisdocument are found and notifications are sent to appropriate subscribed clients. Althoughinformation filtering has been in the research agenda for about half a century, there is a hugeparadox when it comes to benchmarking the performance of such systems. There is a strikinglack of a benchmarking mechanism (in the form of a large-scale standarised test collection ofcontinuous queries and the relevant document publications) specifically created for evaluatingfiltering tasks. This work aims at filling this gap by proposing a methodology for automaticallycreating massive continuous query datasets from available document collections. We intend topublicly release all related material (including the software accompanying the proposedmethodology) to the research community after publication.

机译：在信息过滤（或发布/订阅）范式中，客户端使用连续查询来订阅服务器，这些查询表示信息需求，而信息源将文档发布到服务器。无论何时发布文档，都会找到满足该文档的连续查询，并将通知发送到适当的订阅客户端。尽管信息过滤已经存在了大约半个世纪，但在基准测试此类系统的性能时却存在巨大的悖论。存在专门为评估过滤任务而创建的基准测试机制（以连续查询的大规模标准测试集合和相关文档出版物的形式）。这项工作旨在通过提出一种从可用文档集中自动创建大量连续查询数据集的方法来填补这一空白。我们打算在出版后向研究社区公开发布所有相关材料（包括建议的方法随附的软件）。

著录项

来源
《Computer Science & Information Technology》 |2018年第7期|共11页
作者
Christos Tryfonopoulos;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Continuous queriesdataset constructioninformation filteringpublish/subscribeinformationdisseminationprofiles;

机译：连续查询数据集构建信息过滤发布/订阅信息传播配置文件;

相似文献

外文文献
中文文献
专利

1. Analysis of range queries and self-spatial join queries on real region datasets stored using an R-tree [J] . Proietti G., Faloutsos C. IEEE Transactions on Knowledge and Data Engineering . 2000,第5期

机译：使用R树存储的真实区域数据集的范围查询和自空间联接查询分析
2. Tree-based partition querying: a methodology for computing medoids in large spatial datasets [J] . Kyriakos Mouratidis, Dimitris Papadias, Spiros Papadimitriou The VLDB journal . 2008,第4期

机译：基于树的分区查询：一种在大型空间数据集中计算medoids的方法
3. A scalable framework for continuous query evaluations over multidimensional, scientific datasets [J] . Tolooee Cameron, Malensek Matthew, Pallickara Sangmi Lee Concurrency and computation: practice and experience . 2016,第8期

机译：用于多维科学数据集上连续查询评估的可扩展框架
4. S2D: Shared Distributed Datasets, Storing Shared Data for Multiple and Massive Queries Optimization in a Distributed Data Warehouse [C] . Rado Ratsimbazafy, Omar Boussaid, Fadila Bentayeb International conference on big data analytics and knowledge discovery . 2017

机译：S2D：共享的分布式数据集，在分布式数据仓库中存储共享数据以进行多个和大规模的查询优化
5. Automatic video categorization for massively large corpora: A paradigm shift for applications in lane tracking. [D] . Yang, Xuebo. 2010

机译：大型语料库的自动视频分类：车道跟踪中应用的范例转变。
6. Automatic Entity Recognition and Typing from Massive Text Corpora: A Phrase and Network Mining Approach [O] . Xiang Ren, Ahmed El-Kishky, Chi Wang, -1

机译：大规模文本语料库的自动实体识别和键入：一种短语和网络挖掘方法
7. Evaluating SPARQL queries on massive RDF datasets [O] . Al-Harbi Razen, Abdelaziz Ibrahim, Kalnis Panos, 2015

机译：在大量RDF数据集上评估SPARQL查询

A Methodology for the Automatic Creation of Massive Continuous Query Datasets from Real Life Corpora

摘要

著录项

相似文献

相关主题

期刊订阅