【24h】

Rapid development of RBMT systems for related languages

机译:快速开发相关语言的RBMT系统

获取原文
获取原文并翻译 | 示例

摘要

A research of already available and accessible language processing tools and materials, mostly corpora, revealed that there is a reasonably big amount of work already done for Slovenian language, less for Serbian. The tools for Slovene language are (reasonable or even good quality): part of speech tagger (Erjavec et al., 2000), lemmatizer (Erjavec et al., 2004), stemmer, none of these tools exists for Serbian language. Both languages have solid monolingual reference corpora (going into hundreds of millions) and a small bilingual corpus that was used mostly for evaluation purposes. Evaluation was conducted on the functional machine translation system and the results presenting coverage using referential corpus and selected evaluation metrics are shown. Objective and subjective evaluation methods were used as only a correct mixture of methods minimizes evaluation bias. Translation quality evaluation was conducted using subjective evaluation methods where a set of native speakers scored translations. Automatic objective measures NIST and BLEU (Papineni et al., 2001) were used to ensure wider coverage. Bilingual corpus was used in both automatic evaluations. Conclusions present strong and weak points of this approach and explore grounds for further work.
机译:对已经可用且可访问的语言处理工具和材料(主要是语料库)的研究表明,对于斯洛文尼亚语言,已经做了相当大的工作,而对塞尔维亚语则少。斯洛文尼亚语的语言工具是(合理甚至高质量):语音标记器(Erjavec等人,2000),lemmatizer(Erjavec等人,2004),词干分析器,这些语言都不存在用于塞尔维亚语的工具。两种语言都有可靠的单语参考语料库(有成千上万种语言)和一个很小的双语语料库,主要用于评估目的。在功能机器翻译系统上进行了评估,并显示了使用参考语料库和所选评估指标呈现覆盖率的结果。客观和主观评估方法仅用于方法的正确组合,以最大程度地减少评估偏差。使用主观评估方法进行翻译质量评估,其中一组以母语为母语的人对翻译进行评分。 NIST和BLEU(Papineni et al。,2001)的自动客观测量被用来确保覆盖面更广。两种自动评估均使用双语语料库。结论总结了这种方法的优缺点,并为进一步的工作探索了基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号