首页> 外文会议>International conference on Asian-Pacific digital libraries >Bridging the Gap - Using External Knowledge Bases for Context-Aware Document Retrieval
【24h】

Bridging the Gap - Using External Knowledge Bases for Context-Aware Document Retrieval

机译:弥合差距 - 使用外部知识库进行上下文知识文件检索

获取原文

摘要

Today, a vast amount of information is made available over the Web in the form of unstructured text indexed by Web search engines. But especially for searches on concepts or context terms, a simple keyword-based Web search may compromise retrieval quality, because query terms may or may not directly occur in the texts (vocabulary problem). The respective state-of-the-art solution is query expansion leading to an increase in recall, although it often also leads to a steep decrease of retrieval precision. This decrease however is a severe problem for digital library providers: in libraries it is vital to ensure high quality retrieval meeting current standards. In this paper we present an approach allowing even for context searches (conceptual queries) with high retrieval quality by using Wikipedia to semantically bridge the gap between query terms and textual content. We do not expand queries, but extract the most important terms from each text document in a focused Web collection and then enrich them with features gathered from Wikipedia. These enriched terms are further used to compute the relevance of a document with respect to a conceptual query. The evaluation shows significant improvements over query expansion approaches: the overall retrieval quality is increased up to 74.5% in mean average precision.
机译:今天,以Web搜索引擎索引的非结构化文本的形式,在We​​b上提供大量信息。但特别是对于概念或上下文术语来搜索,基于简单的基于关键字的Web搜索可能会损害检索质量,因为查询术语可以在文本(词汇表)中直接出现或不可出现。各个最先进的解决方案是查询扩展,导致召回的增加,尽管它通常也导致检索精度的陡峭降低。然而,这减少了数字图书馆提供商的严重问题:在图书馆中,确保高质量的检索会满足当前标准至关重要。在本文中,我们通过使用维基百科以在语义上桥接查询术语和文本内容之间的间隙来展示一种允许的方法允许的方法(概念查询),甚至可以使用高检索质量。我们不会展开查询,但从集中的Web集合中的每个文本文档中提取最重要的术语,然后通过从维基百科收集的功能来丰富。这些丰富的术语进一步用于计算文档关于概念查询的相关性。评估显示出对查询扩展方法的显着改进:总体检索质量增加到平均平均精度高达74.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号