Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging

机译：挑战性的阿拉伯语依赖语言的细分：在机器翻译和词性标注中的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Word segmentation plays a pivotal role in improving any Arabic NLP application. Therefore, a lot of research has been spent in improving its accuracy. Off-the-shelf tools, however, are: ⅰ) complicated to use and ⅱ) domain/dialect dependent. We explore three language-independent alternatives to morphological segmentation using: ⅰ) data-driven sub-word units, ⅱ) characters as a unit of learning, and ⅲ) word embeddings learned using a character CNN (Convolution Neural Network). On the tasks of Machine Translation and POS tagging, we found these methods to achieve close to, and occasionally surpass state-of-the-art performance. In our analysis, we show that a neural machine translation system is sensitive to the ratio of source and target tokens, and a ratio close to 1 or greater, gives optimal performance.

机译：分词在改善任何阿拉伯语NLP应用程序中都起着举足轻重的作用。因此，已经花费了大量的研究来提高其准确性。但是，现成的工具有：ⅰ）使用复杂，并且ⅱ）取决于域/方言。我们探索以下三种与语言无关的形态学分割方法：ⅰ）数据驱动的子词单元，ⅱ）字符作为学习单元，以及ⅲ）使用字符CNN（卷积神经网络）学习词嵌入。在机器翻译和POS标记的任务上，我们发现这些方法可以达到甚至有时超过最先进的性能。在我们的分析中，我们表明神经机器翻译系统对源令牌和目标令牌的比率很敏感，并且比率接近1或更大，可以提供最佳性能。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2017年|601-607|共7页
会议地点
作者
Hassan Sajjad; Fahim Dalvi; Nadir Durrani; Ahmed Abdelali; Yonatan Belinkov; Stephan Vogel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Morphological Segmentation and Part-of-Speech Tagging for the Arabic Heritage [J] . Mohamed Emad ACM transactions on Asian language information processing . 2018,第3期

机译：阿拉伯文的形态学分割和词性标注
2. The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation [J] . Hassan Al-Haj, Alon Lavie Machine translation . 2012,第1a2期

机译：阿拉伯语形态分割对广泛覆盖的英语到阿拉伯语统计机器翻译的影响
3. Using Target-language Information To Train Part-of-speech Taggers For Machine Translation [J] . Felipe Sanchez-Martinez, Juan Antonio Perez-Ortiz, Mikel L. Forcada Machine translation . 2008,第1a2期

机译：使用目标语言信息来训练用于机器翻译的词性标注
4. Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging [C] . Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Annual meeting of the Association for Computational Linguistics . 2017

机译：挑战阿拉伯语的语言依赖性分割：一个机器翻译和术语标记的应用程序
5. Translating Metalanguage: An Arabic Translation and Analysis of Munday's Application of the Appraisal Framework to Translation Studies [D] . Mohamed, Sayed A. 2017

机译：翻译元语言：阿拉伯语翻译和Munday评估框架在翻译研究中的应用分析
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging [O] . Sajjad, Hassan, Dalvi, Fahim, Durrani, Nadir, 2017

机译：挑战阿拉伯语的语言依赖分割：一种应用机器翻译和词性标注

Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging

摘要

著录项

相似文献

相关主题

期刊订阅