首页> 外文会议>CCF International Conference on Natural Language Processing and Chinese Computing >Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora
【24h】

Automatic Translating Between Ancient Chinese and Contemporary Chinese with Limited Aligned Corpora

机译:古代汉语与当代汉语与汉语 - 当代语料库中的自动翻译

获取原文

摘要

The Chinese language has evolved a lot during the long-term development. Therefore, native speakers now have trouble in reading sentences written in ancient Chinese. In this paper, we propose to build an end-to-end neural model to automatically translate between ancient and contemporary Chinese. However, the existing ancient-contemporary Chinese parallel corpora are not aligned at the sentence level and sentence-aligned corpora are limited, which makes it difficult to train the model. To build the sentence level parallel training data for the model, we propose an unsupervised algorithm that constructs sentence-aligned ancient-contemporary pairs by using the fact that the aligned sentence pair shares many of the tokens. Based on the aligned corpus, we propose an end-to-end neural model with copying mechanism and local attention to translate between ancient and contemporary Chinese. Experiments show that the proposed unsupervised algorithm achieves 99.4% F1 score for sentence alignment, and the translation model achieves 26.95 BLEU from ancient to contemporary, and 36.34 BLEU from contemporary to ancient.
机译:在长期发展中,汉语演变了很多。因此,母语人士现在遇到古代汉语读书的奇迹。在本文中,我们建议建立一个端到端的神经模型,自动翻译古代和当代汉语。然而,现有的古代中国平行语料库在句子级没有对齐,句子对齐的是有限的,这使得训练模型很难。为了构建模型的句子级并行培训数据,我们提出了一种无监督算法,通过使用对齐的句子对许多令牌来构造句子对齐的古代当代对。基于对齐的语料库,我们提出了一个端到端的神经模型,复制机制和当地关注古代和当代汉语之间的翻译。实验表明,提出的无监督算法达到了句子对齐的99.4%F1分数,而翻译模型从古代到当代的古代达到26.95个BLEU,以及来自当代到古代的36.34 Bleu。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号