首页> 外文会议>Machine translation summit;Workshop on technologies for MT of low resource languages >A step towards Torwali machine translation: an analysis of morphosyntactic challenges in a low-resource language
【24h】

A step towards Torwali machine translation: an analysis of morphosyntactic challenges in a low-resource language

机译:迈向Torwali机器翻译的一步:分析低资源语言中的句法句法挑战

获取原文

摘要

Torwali is an endangered language spoken in the north of Pakistan. It is a computationally challenging language because of its RTL Perso-Arabic script, non-concatenative nature and distinct words alterations. This paper discusses issues and challenges regarding grammatical structure, divergence in terms of lexicon as well as morphological makeup for the machine translation of a less studied language. It includes creation of NLP tools such as parts of speech (POS) tagger and morphological analyser with HFST which is based on the idea of building lexicon and morphological rules using finite state devices. This work, on which this paper is based, will be a source of Torwali finite state morphology and its future computational growth as electronic dictionaries are usually equipped with morphological analyser and it will also be helpful for developing language pairs.
机译:托瓦利语是巴基斯坦北部的一种濒临灭绝的语言。由于它的RTL波斯语-阿拉伯语脚本,非连接性质和独特的单词更改,因此它是一种计算难题。本文讨论了语法结构,词汇差异以及用于较少学习的语言的机器翻译的形态构成方面的问题和挑战。它包括创建NLP工具,例如语音部分(POS)标记器和带有HFST的形态分析器,这是基于使用有限状态设备构建词典和形态规则的思想。本文基于此工作,将成为Torwali有限状态形态学及其未来计算增长的源泉,因为电子词典通常都配备了形态分析仪,这对开发语言对也将有所帮助。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号