首页> 外文会议>Asia-Pacific Conference on Communications Technology and Computer Science >An Improved Automatic Extraction of Chinese Mathematical Terminology with Iterated Dilated Residual Gated Convolutions
【24h】

An Improved Automatic Extraction of Chinese Mathematical Terminology with Iterated Dilated Residual Gated Convolutions

机译:用迭代扩张剩余门控综合改进的中国数学术语自动提取

获取原文

摘要

Automatic term extraction (ATE) is the primary work in ontology construction, text summarization, and knowledge graph. For automatic extraction of Chinese mathematical terminology, we propose an improved model base on Robustly Optimized BERT Pre-training Approach (RoBERTa), Iterated Dilated Residual Gated CNN (IDRG-CNN), and Bidirectional LSTM with a CRF layer (BiLSTM-CRF). To evaluate the model, we annotate a corpus of Chinese terms for probability theory and mathematical statistics. We divide the annotated corpus into a training set, verification set, and test set and ensure that the terms of the three data sets are not repeated to evaluate the model's generalization ability. Empirical results show that this model can effectively extract terms and reveal that the model has strong generalization ability, particularly which improves the ability to recognize long terms.
机译:自动术语提取(ATE)是本体结构,文本摘要和知识图中的主要工作。 对于中国数学术语的自动提取,我们提出了一种改进的模型基础,在鲁棒优化的BERT预训练方法(Roberta),迭代扩张残留的残留的CNN(IDRG-CNN)和具有CRF层(Bilstm-CRF)的双向LSTM。 为了评估模型,我们向概率理论和数学统计提供了一种中国术语的语料库。 我们将带注释的语料库分为训练集,验证集和测试集,并确保不重复三个数据集的术语来评估模型的泛化能力。 经验结果表明,该模型可以有效地提取条款并揭示该模型具有强大的泛化能力,特别是提高了认识较长术语的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号