...
首页> 外文期刊>International Journal of Electrical and Computer Engineering >Atar: Attention-based LSTM for Arabizi transliteration
【24h】

Atar: Attention-based LSTM for Arabizi transliteration

机译:Atar:基于关注的LSTM,用于Arabizi音译

获取原文
           

摘要

A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present Atar, an attention-based encoder-decoder model for Arabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49).
机译:阿拉伯语脚本的非标准罗马化,被称为Arbizi,广泛用于阿拉伯语在线和短信/聊天社区。 然而,由于阿拉伯语NLP的最先进的工具和应用程序预期阿拉伯语用阿拉伯语脚本写入,因此通过构建定制工具或通过将它们翻译成阿拉伯语脚本来处理arabizi的处理内容需要特别关注。 后一种方法是越常见的方法,这项工作朝这个方向提出了两个显着的贡献。 第一个是收集和公开发布第一个大规模的“阿拉伯语到阿拉伯语脚本”并联语料库,专注于约旦方言,由母语扬声器仔细创建和检查25 k对,以确保最高质量。 其次,我们在Arabizi音译中呈现了一种基于注意的编码器 - 解码器模型。 在我们的数据集中培训和测试此模型会产生令人印象深刻的准确性(79%)和BLEU分数(88.49)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号