...
首页> 外文期刊>International journal of semantic computing >Fine-Tuning an Algorithm for Semantic Search Using a Similarity Graph
【24h】

Fine-Tuning an Algorithm for Semantic Search Using a Similarity Graph

机译:使用相似度图微调语义搜索算法

获取原文
获取原文并翻译 | 示例
           

摘要

Given a set of documents and an input query that is expressed in a natural lariguage, the problem of document search is retrieving the most relevant documents. Unlike most existing systems that perform document search based on keyword matching, we propose a method that considers the meaning of the words in the queries and documents. As a result, our algorithm can return documents that have no words in common with the input query as long as the documents are relevant. For example, a document that contains the words "Ford", "Chrysler" and "General Motors" multiple times is surely relevant for the query "car" even if the word "car" never appears in the document. Our information retrieval algorithm is based on a similarity graph that contains the degree of semantic closeness between terms, where a term can be a word or a phrase. Since the algorithms that constructs the similarity graph takes as input a myriad of parameters, in this paper we fine-tune the part of the algorithm that constructs the Wikipedia part of the graph. Specifically, we experimentally fine-tune the algorithm on the Miller and Charles study benchmark that contains 30 pairs of terms and their similarity score as determined by human users. We then evaluate the performance of the fine-tuned algorithm on the Cranfield benchmark that contains 1400 documents and 225 natural language queries. The benchmark also contains the relevant documents for every query as determined by human judgment. The results show that the fine-tuned algorithm produces higher mean average precision (MAP) score than traditional keyword-based search algorithms because our algorithm considers not only the words and phrases in the query and documents, but also their meaning.
机译:给定一组文档和以自然语言表达的输入查询,文档搜索的问题是检索最相关的文档。与大多数现有的基于关键字匹配执行文档搜索的系统不同,我们提出了一种考虑查询和文档中单词含义的方法。结果,只要文档相关,我们的算法就可以返回与输入查询没有共同词的文档。例如,即使单词“ car”从未出现在文档中,多次包含单词“ Ford”,“ Chrysler”和“ General Motors”的文档也确实与查询“ car”相关。我们的信息检索算法基于相似度图,该相似度图包含术语之间的语义紧密程度,其中术语可以是单词或短语。由于构造相似性图的算法将大量参数作为输入,因此在本文中,我们对构成图的Wikipedia部分的算法部分进行了微调。具体来说,我们根据Miller和Charles研究基准对实验进行了微调,该基准包含30对术语及其由人类用户确定的相似性得分。然后,我们在Cranfield基准测试中评估微调算法的性能,该基准测试包含1400个文档和225个自然语言查询。基准还包含由人工判断确定的每个查询的相关文档。结果表明,由于我们的算法不仅考虑了查询和文档中的单词和短语,而且考虑了其含义,因此与传统的基于关键字的搜索算法相比,微调算法产生的平均平均精度(MAP)得分更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号