...
首页> 外文期刊>Future generation computer systems >Deep neural network-based recognition of entities in Chinese online medical inquiry texts
【24h】

Deep neural network-based recognition of entities in Chinese online medical inquiry texts

机译:基于深度神经网络的中文在线医学查询文本的实体认可

获取原文
获取原文并翻译 | 示例
           

摘要

It is quite challenging to correctly identify entities such as disease names, symptoms, and drugs from Chinese online medical inquiry texts. On the one hand, traditional natural language related methods cannot be directly applied to the field of online medical inquiry. Although supervised or unsupervised learning algorithms provide an entity recognition strategy for medical inquiries on online platforms, these methods either rely extensively on specific knowledge sources or artificially-designed features, or have a strong self-adaptivity, can barely obtain fairly good entity recognition outcomes, and consequently have weak generalization. On the other hand, Chinese online medical inquiry data is characterized by a large volume and excessively rich unstructured data, medical entities are indeed distributed in a sparse way, and the Chinese characters and words are quite complicated. It is difficult to establish a robust model for the recognition of entities in Chinese online medical inquiry texts. Therefore, establishing a new deep neural network (OMINer-CE) is the attempt of this paper. First, in order to form a proper feature strategy, the basic features of other tasks are introduced, while extended features, such as continuous bag of word cluster (CBOWC) feature, are constructed. Second, the feature vectors of Chinese character and word fusion are introduced to reserve all the Chinese character information of original sequences while introducing Chinese word-based semantic information. Third, a context encoding layer and a label decoding layer are introduced; On the basis of the recurrent neural network model BiLSTM, a convolutional neural network (CNN) is added to learn more key local features of Chinese word context, and the attention mechanism is used to obtain the long-distance dependency of the Chinese words to form a OMINer model. Finally, the basic and extended feature vectors are integrated into the OMINer model, and the grammatical and semantic information contained in the labeled texts is obtained from different perspectives. Therefore, it considers the feature strategies of Chinese online medical platforms, and realizes a strong self-adaptivity by utilizing the deep neural network. It is showed by the experiments that preferably good performances can be realized by combining the CE basic and extended feature vectors in the BiLSTM of the OMINer model, suggesting that the OMINer-CE model improves the performance of recognizing entities in Chinese online medical inquiry texts.
机译:正确识别来自中国在线医学查询文本的疾病名称,症状和毒品等实体是非常具有挑战性的。一方面,传统的自然语言相关方法不能直接应用于在线医疗查询领域。虽然监督或无监督的学习算法为在线平台上提供了实体识别策略,但这些方法依赖于特定知识来源或人工设计的功能,或者具有强烈的自适应,几乎无法获得相当好的实体识别结果,因此有薄弱的泛化。另一方面,中国在线医疗查询数据的特点是大量和过度丰富的非结构化数据,医学实体确实以稀疏的方式分发,汉字和单词非常复杂。在中国在线医学查询文本中识别实体难以建立一个强大的模型。因此,建立一个新的神经网络(Ominer-Ce)是本文的尝试。首先,为了形成适当的特征策略,介绍了其他任务的基本功能,而构建了扩展功能,例如连续的单词群(CBOWC)功能。其次,引入了汉字和词汇融合的特征向量,以便在引入中文基于词的语义信息时保留原始序列的所有汉字信息。第三,介绍了上下文编码层和标签解码层;在经常性神经网络模型Bilstm的基础上,添加了一种卷积神经网络(CNN)以了解更多的汉字上下文的关键本地特征,并且注意机制用于获得中文单词形成的长距离依赖性一个Ominer模型。最后,基本和扩展特征向量集成到Ominer模型中,并且标记文本中包含的语法和语义信息是从不同的视角获得的。因此,它考虑了中国在线医疗平台的特征策略,并通过利用深神经网络实现了强大的自适应。通过实验表明,优选地,通过将​​CE基本和扩展特征向量组合在Ominer模型的BILSTM中,表明OMINER-CE模型可以提高识别在中国在线医学查询文本中的识别实体的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号