首页> 美国卫生研究院文献>Data in Brief >Event-Dataset: Temporal information retrieval and text classification dataset
【2h】

Event-Dataset: Temporal information retrieval and text classification dataset

机译:事件数据集:时间信息检索和文本分类数据集

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recently, Temporal Information Retrieval (TIR) has grabbed the major attention of the information retrieval community. TIR exploits the temporal dynamics in the information retrieval process and harnesses both textual relevance and temporal relevance to fulfill the temporal information requirements of a user Ur Rehman Khan et al., 2018. The focus time of document is an important temporal aspect which is defined as the time to which the content of the document refers Jatowt et al., 2015; Jatowt et al., 2013; Morbidoni et al., 2018, Khan et al., 2018. To the best of our knowledge, there does not exist any standard benchmark data set (publicly available) that holds the potential to comprehensively evaluate the performance of focus time assessment strategies. Considering these aspects, we have produced the Event-dataset, which is comprised of 35 queries and set of news articles for each query. Such that, C={Qs,Ds}, where C represents the dataset, Qsis query set Qs={q1,q2,q3,.,q35}and for each qi there is a set of news articles qi={dr,dnr}. dr,dnrare sets of relevant documents and non-relevant documents respectively. Each query in the dataset represents a popular event. To annotate these articles into relevant and non-relevant, we have employed a user-study based evaluation method wherein a group of postgraduate students manually annotate the articles into the aforementioned categories. We believe that the generation of such dataset can provide an opportunity for the information retrieval researchers to use it as a benchmark to evaluate focus time assessment methods specifically and information retrieval methods generically.
机译:最近,时间信息检索(TIR)引起了信息检索社区的广泛关注。 TIR在信息检索过程中利用了时间动态,并利用文本相关性和时间相关性来满足用户的时间信息要求Ur Rehman Khan等人,2018。文档的焦点时间是重要的时间方面,其定义为文档内容引用Jatowt等人的时间,2015年; Jatowt等,2013; Morbidoni et al。,2018,Khan et al。,2018.据我们所知,没有任何标准的基准数据集(可公开获得)具有全面评估聚焦时间评估策略性能的潜力。考虑到这些方面,我们生成了事件数据集,该数据集由35个查询和每个查询的新闻文章集组成。这样, C = { Q s D s } < / mrow> 其中C代表数据集, Q s 是查询集 Q s = { q 1 q 2 q 3 ... ... q 35 < mo Stretchy =“ true”>} 以及每个 q i 有一组新闻文章 q i < / mrow> = { d r d < / mi> n r } d < / mi> r d n r 分别是相关文件和不相关文件的集合。数据集中的每个查询代表一个流行事件。为了将这些文章注释为相关和不相关,我们采用了基于用户研究的评估方法,其中一组研究生手动将文章注释为上述类别。我们认为,此类数据集的生成可以为信息检索研究人员提供机会,将其用作基准,以专门评估焦点时间评估方法和一般性地评估信息检索方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号