首页> 外文会议>International conference on intelligent text processing and computational linguistics >Arabic Transliteration of Romanized Tunisian Dialect Text: A Preliminary Investigation
【24h】

Arabic Transliteration of Romanized Tunisian Dialect Text: A Preliminary Investigation

机译:突尼斯方言罗马化文字的阿拉伯音译:初步调查

获取原文

摘要

In this paper, we describe the process of converting Tunisian Dialect text that is written in Latin script (also called Arabizi) into Arabic script following the CODA orthography convention for Dialectal Arabic. Our input consists of messages and comments taken from SMS, social networks and broadcast videos. The language used in social media and SMS messaging is characterized by the use of informal and non-standard vocabulary such as repeated letters for emphasis, typos, non-standard abbreviations, and nonlinguistic content, such as emoticons. There is a high degree of variation is spelling in Arabic dialects due to the lack of orthographic widely supported standards in both Arabic and Latin scripts. In the context of natural language processing, transliterating from Arabizi to Arabic script is a necessary step since most recently available tools for processing Arabic Dialects expect Arabic script input.
机译:在本文中,我们描述了遵循CODA正字法(针对阿拉伯方言)将以拉丁语(也称为Arabizi)书写的突尼斯方言文本转换为阿拉伯语的过程。我们的输入包括来自SMS,社交网络和广播视频的消息和评论。社交媒体和SMS消息中使用的语言的特点是使用非正式和非标准的词汇,例如强调重点的重复字母,错别字,非标准的缩写以及诸如表情符号的非语言内容。由于缺乏阿拉伯语和拉丁语文字的正字法得到广泛支持的标准,阿拉伯方言的拼写存在很大差异。在自然语言处理的背景下,从Arabizi到阿拉伯文字的音译是必不可少的步骤,因为最近使用的处理阿拉伯方言的工具都希望输入阿拉伯文字。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号