首页> 外文会议>International Conference on Formal Grammar >A Purely Surface-Oriented Approach to Handling Arabic Morphology
【24h】

A Purely Surface-Oriented Approach to Handling Arabic Morphology

机译:一种纯粹面向表面的处理阿拉伯形态的方法

获取原文

摘要

In this paper, we introduce a completely lexicalist approach to deal with Arabic morphology. This purely surface-oriented treatment is part of a comprehensive mathematical approach to integrate Arabic syntax and semantics using overt morphological features in the string-to-meaning translation. The basic motivation of our approach is to combine semantic representations with formal descriptions of morphological units. That is, the lexicon is a collection of signs; each sign 5 is a triple 5 — <E,C, M>, such that E is the exponent, C is the combinatorics and M is the meaning of the sign. Here, we are only concerned with the exponents, i.e. the components of a morphosemantic lexicon (for a fragment of Arabic). To remain surface-oriented, we allow for discontinuity in the constituents; constituents are sequences of strings, which can only be concatenated or duplicated, but no rule can delete, add or modify any string. Arabic morphology is very well known for its complexity and richness. The word formation in Arabic poses real challenges because words are derived from roots, which bear the core meaning of their derivatives, formed by inserting vowels and maybe other consonants. The units in the sequences are so-called glued strings rather than only strings. A glued string is a string that has left and right context conditions. Optimally morphs are combined in a definite and non-exceptional linear way, as in many cases in different languages (e.g. plural in English). The process of Arabic word formation is rather complex; it is not just a sequential concatenation of morphs by placing them next to each other. But the constituents are discontinuous. Vowels and more consonants are inserted between, before and after the root consonants resulting in what we call 'fractured glued string', i.e. as a sequence of glued strings combined in diverse ways; forward concatenation, backward concatenation, forward wrapping, reduction, forward transfixation and, going beyond the multi-context free grammars (MCFGs), also reduplication.
机译:在本文中,我们介绍了一种完全词汇主义的方法来处理阿拉伯语形态。这种纯粹面向表面的处理方法是一种综合数学方法的一部分,该方法可以在字符串到含义的翻译中使用明显的形态特征来集成阿拉伯语语法和语义。我们方法的基本动机是将语义表示与形态单位的形式描述相结合。也就是说,词典是符号的集合;每个符号5是一个三元组5 — <E,C,M>,使得E是指数,C是组合符号,M是符号的含义。在这里,我们只关注指数,即一个词义词典的组成部分(对于阿拉伯语的一个片段)。为了保持面向表面,我们允许成分间断;组成部分是字符串序列,只能串联或重复,但没有规则可以删除,添加或修改任何字符串。阿拉伯语形态以其复杂性和丰富性而闻名。阿拉伯语中的单词构成带来了真正的挑战,因为单词源自词根,而词根具有其派生词的核心含义,这些词根是通过插入元音或其他辅音而形成的。序列中的单位是所谓的胶合弦,而不仅仅是弦。粘合字符串是具有左右上下文条件的字符串。最佳情况下,词素以明确且无异常的线性方式进行组合,就像在许多情况下使用不同的语言(例如英语中的复数形式)一样。阿拉伯语单词形成过程非常复杂;通过将变形彼此并排放置,这不仅仅是变形的顺序串联。但是成分是不连续的。在根辅音之间,之前和之后插入元音和更多辅音,这就是我们所说的“破裂的胶合弦”,即一系列以不同方式组合的胶合弦;正向级联,向后级联,正向包装,归约,正向固定化以及超越多上下文自由语法(MCFG)的重复。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号