Neural sentence embedding models for semantic similarity estimation in the biomedical domain

Kathrin Blagec; Hong Xu; Asan Agibetov; Matthias Samwald

首页> 外文期刊>BMC Bioinformatics >Neural sentence embedding models for semantic similarity estimation in the biomedical domain

【24h】

Neural sentence embedding models for semantic similarity estimation in the biomedical domain

机译：神经句子嵌入生物医学域中语义相似性估计的模型

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Neural?network based embedding models are receiving significant attention in the field of natural language processing due to their capability to effectively capture semantic information representing words, sentences or even larger text elements in low-dimensional vector space. While current state-of-the-art models for assessing the semantic similarity of textual statements from biomedical publications depend on the availability of laboriously curated ontologies, unsupervised neural embedding models only require large text corpora as input and do not need manual curation. In this study, we investigated the efficacy of current state-of-the-art neural sentence embedding models for semantic similarity estimation of sentences from biomedical literature. We trained different neural embedding models on 1.7 million?articles from the PubMed Open Access dataset, and evaluated them based on a biomedical benchmark set containing 100 sentence pairs annotated by human experts and a smaller contradiction subset derived from the original benchmark set. Experimental results showed that, with a Pearson correlation of 0.819, our best unsupervised model based on the Paragraph Vector Distributed Memory algorithm outperforms previous state-of-the-art results achieved on the BIOSSES biomedical benchmark set. Moreover, our proposed supervised model that combines different string-based similarity metrics with a neural embedding model surpasses previous ontology-dependent supervised state-of-the-art approaches in terms of Pearson's r (r?=?0.871) on the biomedical benchmark set. In contrast to the promising results for the original benchmark, we found our best models' performance on the smaller contradiction subset to be poor. In this study, we have highlighted the value of neural network-based models for semantic similarity estimation in the biomedical domain by showing that they can keep up with and even surpass previous state-of-the-art approaches for semantic similarity estimation that depend on the availability of laboriously curated ontologies, when evaluated on a biomedical benchmark set. Capturing contradictions and negations in biomedical sentences, however, emerged as an essential area for further work.

机译：神经网络基于网络的嵌入模型在自然语言处理领域中受到显着的关注，因为它们的能力有效地捕获代表单词，句子或甚至更大的文本元素在低维矢量空间中的语义信息。虽然当前用于评估生物医学出版物的文本语句的语义相似性的最先进模型取决于愈合本体的可用性，但无监督的神经嵌入模型只需要大型文本语料库作为输入，并且不需要手动策策。在这项研究中，我们调查了当前最先进的神经句子嵌入模型的疗效，以获得生物医学文献的句子语义相似性估算。我们在170万上培训了不同的神经嵌入模型？来自PubMed开放访问数据集的文章，并根据含有100句对对的生物医学基准组评估它们，其中由人体专家注释和源自原始基准集的较小矛盾子集。实验结果表明，具有0.819的Pearson相关性，我们基于段落分布式存储器算法的最佳无监督模型优于生物医学基准组上实现的先前最先进的结果。此外，我们建议的监督模型与神经嵌入式模型相结合的基于串的相似度量，在Pearson的R（r？= 0.871）方面超越了先前的本体依赖性的最新方法，在生物医学基准集合中。与原始基准的有希望的结果相比，我们发现我们在较小的矛盾子集上的最佳模型性能差。在这项研究中，我们突出了生物医学领域中的基于神经网络的模型的基于语义相似性估计的价值，通过表示它们可以跟上甚至超过以前的最先进的方法，以获得依赖的语义相似性估算在生物医学基准集合评估时，愈合了病毒的可用性。然而，捕捉生物医学句子中的矛盾和否定，作为进一步工作的重要领域。

著录项

来源
《BMC Bioinformatics》 |2019年第1期|共10页
作者
Kathrin Blagec; Hong Xu; Asan Agibetov; Matthias Samwald;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Natural language processingSemanticsNeural embedding models;

机译：自然语言处理过期嵌入模型;

相似文献

外文文献
中文文献
专利

1. BIOSSES: a semantic sentence similarity estimation system for the biomedical domain [J] . Bioinformatics . 2017,第14期

机译：生物医学域的语义句子相似性估算系统
2. Sentence Embedding and Convolutional Neural Network for Semantic Textual Similarity Detection in Arabic Language [J] . Mahmoud Adnen, Zrigui Mounir Arabian Journal for Science and Engineering . 2019,第11期

机译：句子嵌入和卷积神经网络用于阿拉伯语语义文本相似性检测
3. Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity [J] . Nguyen Huy Tien, Nguyen Minh Le, Tomohiro Yamasaki, Information Processing & Management . 2019,第6期

机译：通过多个词嵌入和多级比较进行句子建模，以实现语义文本相似性
4. Adapting Gloss Vector Semantic Relatedness Measure for Semantic Similarity Estimation: An Evaluation in the Biomedical Domain [C] . Ahmad Pesaranghader, Azadeh Rezaei, Ali Pesaranghader Joint international semantic technology conference . 2014

机译：自适应的光泽度向量语义相关性度量用于语义相似性估计：在生物医学领域的评估
5. Using semantic similarity measures in the biomedical domain for computing functional similarity between genes based on gene ontology [D] . Khabiri, Elham 2007

机译：在生物医学领域中使用语义相似性度量基于基因本体计算基因之间的功能相似性
6. Neural sentence embedding models for semantic similarity estimation in the biomedical domain [O] . Kathrin Blagec, Hong Xu, Asan Agibetov, 2019

机译：神经句子嵌入模型在生物医学领域的语义相似度估计
7. Neural sentence embedding models for semantic similarity estimation in the biomedical domain [O] . Kathrin Blagec, Hong Xu, Asan Agibetov, 2019

机译：神经句子嵌入生物医学域中语义相似性估计的模型

Neural sentence embedding models for semantic similarity estimation in the biomedical domain

摘要

著录项

相似文献

相关主题

期刊订阅