首页> 外文会议>CCF International Conference on Natural Language Processing and Chinese Computing >Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora
【24h】

Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora

机译:有限对齐语料库自动在古汉语和当代汉语之间进行翻译

获取原文

摘要

The Chinese language has evolved a lot during the long-term development. Therefore, native speakers now have trouble in reading sentences written in ancient Chinese. In this paper, we propose to build an end-to-end neural model to automatically translate between ancient and contemporary Chinese. However, the existing ancient-contemporary Chinese parallel corpora are not aligned at the sentence level and sentence-aligned corpora are limited, which makes it difficult to train the model. To build the sentence level parallel training data for the model, we propose an unsupervised algorithm that constructs sentence-aligned ancient-contemporary pairs by using the fact that the aligned sentence pair shares many of the tokens. Based on the aligned corpus, we propose an end-to-end neural model with copying mechanism and local attention to translate between ancient and contemporary Chinese. Experiments show that the proposed unsupervised algorithm achieves 99.4% F1 score for sentence alignment, and the translation model achieves 26.95 BLEU from ancient to contemporary, and 36.34 BLEU from contemporary to ancient.
机译:在长期发展中,汉语已经发展了很多。因此,现在说母语的人在阅读用古代汉语写的句子时遇到了麻烦。在本文中,我们建议建立一个端到端的神经模型,以自动在古代和当代汉语之间进行翻译。但是,现有的近现代汉语平行语料库在句子层次上并没有对齐,句子对齐语料库也很有限,因此很难训练模型。为了构建模型的句子级并行训练数据,我们提出了一种无监督算法,该算法利用对齐的句子对共享许多标记的事实来构造句子对齐的古代-当代对。在对齐语料库的基础上,我们提出了一种具有复制机制和局部注意力的端到端神经模型,以在古代和当代汉语之间进行翻译。实验表明,所提出的无监督算法在句子对齐上的F1得分达到99.4%,翻译模型从古到现代达到26.95 BLEU,从现代到古代达到36.34 BLEU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号