首页> 外文会议>IEEE International Conference on Semantic Computing >Codeswitched Sentence Creation using Dependency Parsing
【24h】

Codeswitched Sentence Creation using Dependency Parsing

机译:使用依赖性解析创建代码溺爱的句子

获取原文

摘要

Codeswitching has become one of the most common occurrences across multilingual speakers of the world, especially in countries like India which encompasses around 23 official languages with the number of bilingual speakers being around 300 million. The scarcity of Codeswitched data becomes a bottleneck in the exploration of this domain with respect to various Natural Language Processing (NLP) tasks. We thus present a novel algorithm which harnesses the syntactic structure of English grammar to develop grammatically sensible Codeswitched versions of English-Hindi, English-Marathi and English-Kannada data. Apart from maintaining the grammatical sanity to a great extent, our methodology also guarantees abundant generation of data from a minuscule snapshot of given data. We use multiple datasets to showcase the capabilities of our algorithm while at the same time we assess the quality of generated Codeswitched data using some qualitative metrics along with providing baseline results for couple of NLP tasks.
机译:CodeSwitching已经成为世界各地多语种演讲者的最常见事件之一,特别是在印度等国家,包含大约23个官方语言,双语扬声器数量约为3亿。代号的稀缺性数据是关于各种自然语言处理(NLP)任务探索该域的瓶颈。因此,我们提出了一种新颖的算法,它利用英语语法的句法结构,开发语法合理的英语 - 印地文,英语 - Marathi和英语 - kannada数据的文字。除了在很大程度上保持语法理智之外,我们的方法还可以保证来自给定数据的微量快照的丰富生成数据。我们使用多个数据集来展示我们算法的功能,同时我们使用一些定性度量评估生成的代码开关数据的质量以及为基于NLP任务提供基线结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号