首页> 外文期刊>ACM transactions on Asian language information processing >Ancient-Modern Chinese Translation with a New Large Training Dataset
【24h】

Ancient-Modern Chinese Translation with a New Large Training Dataset

机译:全新的大型培训数据集的古今汉语翻译

获取原文
获取原文并翻译 | 示例
           

摘要

Chinese brings the wisdom and spirit culture of the Chinese nation. Automatic translation from ancient Chinese to modern Chinese helps to inherit and carry forward the quintessence of the ancients. However, the lack of large-scale parallel corpus limits the study of machine translation in ancient-modern Chinese. In this article, we propose an ancient-modern Chinese clause alignment approach based on the characteristics of these two languages. This method combines both lexical-based information and statistical-based information, which achieves 94.2 F1-score on our manual annotation Test set. We use this method to create a new large-scale ancient-modern Chinese parallel corpus that contains 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality ancient-modern Chinese dataset. Furthermore, we analyzed and compared the performance of the SMT and various NMT models on this dataset and provided a strong baseline for this task.
机译:中国人带来了中华民族的智慧和精神文化。从古代汉语到现代汉语的自动翻译有助于继承和发扬古代人的精髓。但是,由于缺乏大规模的并行语料库,对古代汉语翻译的研究受到了限制。在本文中,我们根据这两种语言的特点提出了一种古今汉语从句对齐方法。该方法结合了基于词汇的信息和基于统计的信息,在我们的手动注释测试集上达到了94.2 F1分数。我们使用这种方法创建了一个新的大规模古汉语平行语料库,该语料库包含1.24M双语对。据我们所知,这是第一个大型的高质量古代中国数据集。此外,我们在此数据集上分析并比较了SMT和各种NMT模型的性能,并为此任务提供了强有力的基准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号