首页> 外文会议>IEEE International Conference on Semantic Computing >Automatic Creation of a Domain Specific Thesaurus Using Siamese Networks
【24h】

Automatic Creation of a Domain Specific Thesaurus Using Siamese Networks

机译:使用SIAMESE网络自动创建域特定的词库

获取原文

摘要

Recent trends have increasingly indicated a shift in search technologies across all applications from syntactic and lexical matching approaches to semantic methods, aiming to understand the intent and contextual meaning of search queries, in order to yield more relevant and accurate results. Such methods often rely on semantic ontologies to map query words to concepts and aid in expansion. However, most applications require a domain specific language definition in order to overcome issues of ambiguity and misinterpretation of meaning. General purpose ontologies are often lacking in this purpose and fail to yield appropriate results in specific applications. In this paper, we propose a novel method of building a domain specific thesaurus for aiding semantic search through automatically creating a refined general thesaurus, followed by training a Siamese Network in two phases to classify candidate synonyms as relevant or non-relevant to the particular domain. We focus on the application of tag-based gallery image retrieval and extract and utilise information from Google's Conceptual Captions dataset in order to improve our model's performance. In order to investigate and justify our training method and architecture, we conduct an ablation study and compare results with our model. We further analytically and empirically demonstrate the advantage of representing terms in a domain-specific environment through semantic vectors fine-tuned on corpora related to the domain. Although our experiments are focused on building a word ontology specific to image retrieval, our method is generic and can be generalised to any field requiring a domain specific semantic language.
机译:最近的趋势已经越来越多地表明,在句法和词法匹配方法到语义方法的所有应用中,旨在了解搜索查询的意图和上下文含义,以产生更相关和准确的结果,从句法和词法匹配方法中表明在所有应用中的转变。这些方法通常依赖于语义本体,以将查询词映射到概念和辅助扩展。但是,大多数应用程序需要域特定语言定义,以克服歧义和误解意义的问题。通用本体往往缺乏此目的,并且未能在特定应用中产生适当的结果。在本文中,我们提出了一种新的建立域特定词库的新方法,以便通过自动创建一个精细的综合生成座龙来辅助语义搜索,然后在两个阶段培训暹序的暹罗网络,将候选同义词分类为与特定域相关或非相关的候选同义词。我们专注于在基于标签的画廊图像检索和提取的应用,并利用来自Google概念标题数据集的信息,以提高我们的模型的性能。为了调查和证明我们的培训方法和架构,我们进行消融研究并与我们的模型进行比较。我们进一步分析并经验证明了通过在与域相关的语料库上进行微调的语义向量代表特定于域的环境中的术语的优点。虽然我们的实验集中在构建特定于图像检索的本体论,但我们的方法是通用的,并且可以广泛地概括为需要域特定语义语言的任何字段。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号