首页> 外文期刊>BMC Bioinformatics >Neural sentence embedding models for semantic similarity estimation in the biomedical domain
【24h】

Neural sentence embedding models for semantic similarity estimation in the biomedical domain

机译:神经句子嵌入生物医学域中语义相似性估计的模型

获取原文
           

摘要

Neural?network based embedding models are receiving significant attention in the field of natural language processing due to their capability to effectively capture semantic information representing words, sentences or even larger text elements in low-dimensional vector space. While current state-of-the-art models for assessing the semantic similarity of textual statements from biomedical publications depend on the availability of laboriously curated ontologies, unsupervised neural embedding models only require large text corpora as input and do not need manual curation. In this study, we investigated the efficacy of current state-of-the-art neural sentence embedding models for semantic similarity estimation of sentences from biomedical literature. We trained different neural embedding models on 1.7 million?articles from the PubMed Open Access dataset, and evaluated them based on a biomedical benchmark set containing 100 sentence pairs annotated by human experts and a smaller contradiction subset derived from the original benchmark set. Experimental results showed that, with a Pearson correlation of 0.819, our best unsupervised model based on the Paragraph Vector Distributed Memory algorithm outperforms previous state-of-the-art results achieved on the BIOSSES biomedical benchmark set. Moreover, our proposed supervised model that combines different string-based similarity metrics with a neural embedding model surpasses previous ontology-dependent supervised state-of-the-art approaches in terms of Pearson's r (r?=?0.871) on the biomedical benchmark set. In contrast to the promising results for the original benchmark, we found our best models' performance on the smaller contradiction subset to be poor. In this study, we have highlighted the value of neural network-based models for semantic similarity estimation in the biomedical domain by showing that they can keep up with and even surpass previous state-of-the-art approaches for semantic similarity estimation that depend on the availability of laboriously curated ontologies, when evaluated on a biomedical benchmark set. Capturing contradictions and negations in biomedical sentences, however, emerged as an essential area for further work.
机译:神经网络基于网络的嵌入模型在自然语言处理领域中受到显着的关注,因为它们的能力有效地捕获代表单词,句子或甚至更大的文本元素在低维矢量空间中的语义信息。虽然当前用于评估生物医学出版物的文本语句的语义相似性的最先进模型取决于愈合本体的可用性,但无监督的神经嵌入模型只需要大型文本语料库作为输入,并且不需要手动策策。在这项研究中,我们调查了当前最先进的神经句子嵌入模型的疗效,以获得生物医学文献的句子语义相似性估算。我们在170万上培训了不同的神经嵌入模型?来自PubMed开放访问数据集的文章,并根据含有100句对对的生物医学基准组评估它们,其中由人体专家注释和源自原始基准集的较小矛盾子集。实验结果表明,具有0.819的Pearson相关性,我们基于段落分布式存储器算法的最佳无监督模型优于生物医学基准组上实现的先前最先进的结果。此外,我们建议的监督模型与神经嵌入式模型相结合的基于串的相似度量,在Pearson的R(r?= 0.871)方面超越了先前的本体依赖性的最新方法,在生物医学基准集合中。与原始基准的有希望的结果相比,我们发现我们在较小的矛盾子集上的最佳模型性能差。在这项研究中,我们突出了生物医学领域中的基于神经网络的模型的基于语义相似性估计的价值,通过表示它们可以跟上甚至超过以前的最先进的方法,以获得依赖的语义相似性估算在生物医学基准集合评估时,愈合了病毒的可用性。然而,捕捉生物医学句子中的矛盾和否定,作为进一步工作的重要领域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号