...
首页> 外文期刊>BMC Medical Informatics and Decision Making >A benchmark dataset and case study for Chinese medical question intent classification
【24h】

A benchmark dataset and case study for Chinese medical question intent classification

机译:基准数据集和中国医疗问题意图分类的案例研究

获取原文
           

摘要

To provide satisfying answers, medical QA system has to understand the intentions of the users’ questions precisely. For medical intent classification, it requires high-quality datasets to train a deep-learning approach in a supervised way. Currently, there is no public dataset for Chinese medical intent classification, and the datasets of other fields are not applicable to the medical QA system. To solve this problem, we construct a Chinese medical intent dataset (CMID) using the questions from medical QA websites. On this basis, we compare four intent classification models on CMID using a case study. The questions in CMID are obtained from several medical QA websites. The intent annotation standard is developed by the medical experts, which includes four types and 36 subtypes of users’ intents. Besides the intent label, CMID also provides two types of additional information, including word segmentation and named entity. We use the crowdsourcing way to annotate the intent information for each Chinese medical question. Word segmentation and named entities are obtained using the Jieba and a well-trained Lattice-LSTM model. We loaded a Chinese medical dictionary consisting of 530,000 for word segmentation to obtain a more accurate result. We also select four popular deep learning-based models and compare their performances of intent classification on CMID. The final CMID contains 12,000 Chinese medical questions and is organized in JSON format. Each question is labeled the intention, word segmentation, and named entity information. The information about question length, number of entities, and are also detailed analyzed. Among Fast Text, TextCNN, TextRNN, and TextGCN, Fast Text and TextCNN models have achieved the best results in four types and 36 subtypes intent classification, respectively. In this work, we provide a dataset for Chinese medical intent classification, which can be used in medical QA and related fields. We performed an intent classification task on the CMID. In addition, we also did some analysis on the content of the dataset.
机译:为了提供令人满意的答案,医学QA系统必须具体地了解用户的问题。对于医疗意图分类,它需要高质量的数据集以通过监督方式培训深度学习方法。目前,没有用于中文医疗意图分类的公共数据集,其他字段的数据集不适用于医疗QA系统。为了解决这个问题,我们使用医疗QA网站的问题构建中国医疗意图数据集(CMID)。在此基础上,我们使用案例研究比较CMID上的四种意图分类模型。 CMID中的问题是从几个医疗QA网站获得的。意图注释标准由医学专家开发,其中包括四种类型和36个用户意图亚型。除了意图标签外,CMID还提供两种类型的附加信息,包括单词分段和命名实体。我们使用众群方式向每个中国医疗问题注释意图信息。使用Jieba和训练良好的晶格LSTM模型获得字分割和命名实体。我们装载了一个中国医学词典,由530,000组成,用于单词分割,以获得更准确的结果。我们还选择四种流行的深度学习的模型,并比较他们对CMID上的意图分类的表现。最终CMID包含12,000个中国医学问题,并以JSON格式组织。每个问题都标有意图,单词分段和命名实体信息。有关问题长度,实体数量的信息,还详细分析。在快文本中,TextCnn,Textrn和TextGCN,快速文本和TextCNN模型分别实现了四种类型和36个亚型意图分类的最佳结果。在这项工作中,我们为中国医疗意图分类提供了一个数据集,可用于医疗QA和相关领域。我们在CMID上执行了意图的分类任务。此外,我们还对数据集的内容进行了一些分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号