A benchmark dataset and case study for Chinese medical question intent classification

Nan Chen; Xiangdong Su; Tongyang Liu; Qizhi Hao; Ming Wei

首页> 外文期刊>BMC Medical Informatics and Decision Making >A benchmark dataset and case study for Chinese medical question intent classification

【24h】

A benchmark dataset and case study for Chinese medical question intent classification

机译：基准数据集和中国医疗问题意图分类的案例研究

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

To provide satisfying answers, medical QA system has to understand the intentions of the users’ questions precisely. For medical intent classification, it requires high-quality datasets to train a deep-learning approach in a supervised way. Currently, there is no public dataset for Chinese medical intent classification, and the datasets of other fields are not applicable to the medical QA system. To solve this problem, we construct a Chinese medical intent dataset (CMID) using the questions from medical QA websites. On this basis, we compare four intent classification models on CMID using a case study. The questions in CMID are obtained from several medical QA websites. The intent annotation standard is developed by the medical experts, which includes four types and 36 subtypes of users’ intents. Besides the intent label, CMID also provides two types of additional information, including word segmentation and named entity. We use the crowdsourcing way to annotate the intent information for each Chinese medical question. Word segmentation and named entities are obtained using the Jieba and a well-trained Lattice-LSTM model. We loaded a Chinese medical dictionary consisting of 530,000 for word segmentation to obtain a more accurate result. We also select four popular deep learning-based models and compare their performances of intent classification on CMID. The final CMID contains 12,000 Chinese medical questions and is organized in JSON format. Each question is labeled the intention, word segmentation, and named entity information. The information about question length, number of entities, and are also detailed analyzed. Among Fast Text, TextCNN, TextRNN, and TextGCN, Fast Text and TextCNN models have achieved the best results in four types and 36 subtypes intent classification, respectively. In this work, we provide a dataset for Chinese medical intent classification, which can be used in medical QA and related fields. We performed an intent classification task on the CMID. In addition, we also did some analysis on the content of the dataset.

机译：为了提供令人满意的答案，医学QA系统必须具体地了解用户的问题。对于医疗意图分类，它需要高质量的数据集以通过监督方式培训深度学习方法。目前，没有用于中文医疗意图分类的公共数据集，其他字段的数据集不适用于医疗QA系统。为了解决这个问题，我们使用医疗QA网站的问题构建中国医疗意图数据集（CMID）。在此基础上，我们使用案例研究比较CMID上的四种意图分类模型。 CMID中的问题是从几个医疗QA网站获得的。意图注释标准由医学专家开发，其中包括四种类型和36个用户意图亚型。除了意图标签外，CMID还提供两种类型的附加信息，包括单词分段和命名实体。我们使用众群方式向每个中国医疗问题注释意图信息。使用Jieba和训练良好的晶格LSTM模型获得字分割和命名实体。我们装载了一个中国医学词典，由530,000组成，用于单词分割，以获得更准确的结果。我们还选择四种流行的深度学习的模型，并比较他们对CMID上的意图分类的表现。最终CMID包含12,000个中国医学问题，并以JSON格式组织。每个问题都标有意图，单词分段和命名实体信息。有关问题长度，实体数量的信息，还详细分析。在快文本中，TextCnn，Textrn和TextGCN，快速文本和TextCNN模型分别实现了四种类型和36个亚型意图分类的最佳结果。在这项工作中，我们为中国医疗意图分类提供了一个数据集，可用于医疗QA和相关领域。我们在CMID上执行了意图的分类任务。此外，我们还对数据集的内容进行了一些分析。

著录项

来源
《BMC Medical Informatics and Decision Making》 |2020年第3期|共7页
作者
Nan Chen; Xiangdong Su; Tongyang Liu; Qizhi Hao; Ming Wei;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Intent classificationDatasetWord segmentationName entity recognition;

机译：Intent ClassificationDataseTword SementationName实体识别;

相似文献

外文文献
中文文献
专利

1. Applying deep matching networks to Chinese medical question answering: a study and a dataset [J] . Junqing He, Mingming Fu, Manshu Tu BMC Medical Informatics and Decision Making . 2019,第2期

机译：将深度匹配网络应用于中医问答：研究和数据集
2. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications [J] . Yiyan Zhang, Yi Xin, Qin Li, BioMedical Engineering OnLine . 2017,第1期

机译：七种数据挖掘算法对生物医学分类应用数据集不同特征的实证研究
3. Assessment of the cropland classifications in four global land cover datasets： A case study of Shaanxi Province, China [J] . CHEN Xiao-yu, LIN Ya, ZHANG Min, 农业科学学报：英文版 . 2017,第002期

机译：四个全球土地覆盖数据集的耕地分类评估：以中国陕西省为例
4. Question and Answer Classification in Czech Question Answering Benchmark Dataset [C] . Dasa Kusnirakova, Marek Medved, Ales Horak International Conference on Agents and Artificial Intelligence . 2019

机译：捷克问题答案基准数据集中的问题和回答分类
5. Classification and Dimensional Reduction Algorithms for Very Large Biomedical Datasets [D] . Li, Huamin. 2017

机译：超大型生物医学数据集的分类和降维算法
6. A benchmark dataset and case study for Chinese medical question intent classification [O] . Nan Chen, Xiangdong Su, Tongyang Liu, 2020

机译：中医问题意图分类的基准数据集和案例研究
7. Classification Potential vs Classification Accuracy: A Comprehensive Study of Evolutionary Algorithms with Biomedical Datasets [O] . Ajay Kumar Tanwani, Muddassar Farooq 2015

机译：分类潜力与分类准确性：生物医学数据集进化算法的综合研究

A benchmark dataset and case study for Chinese medical question intent classification

摘要

著录项

相似文献

相关主题

期刊订阅