首页> 外文会议>Workshop on Computational Approaches to Code Switching >An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines
【24h】

An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines

机译:西班牙报纸头条上的新兴盎格鲁主义注解语料库

获取原文

摘要

The extraction of anglicisms (lexical borrowings from English) is relevant both for lexicographic purposes and for NLP downstream tasks. We introduce a corpus of European Spanish newspaper headlines annotated with anglicisms and a baseline model for anglicism extraction. In this paper we present: (1) a corpus of 21,570 newspaper headlines written in European Spanish annotated with emergent anglicisms and (2) a conditional random field baseline model with handcrafted features for anglicism extraction. We present the newspaper headlines corpus, describe the annotation tagset and guidelines and introduce a CRF model that can serve as baseline for the task of detecting anglicisms. The presented work is a first step towards the creation of an anglicism extractor for Spanish ncwswire.
机译:语言提取(英语中的词汇借用)与词典目的和NLP下游任务都相关。我们介绍了欧洲英语报纸头条,上面标注了盎格鲁主义和盎格鲁主义提取的基线模型。在本文中,我们介绍:(1)用欧洲西班牙文写成的21,570个报纸头条的语料库,注有紧急英语,以及(2)具有手工特征的条件随机场基线模型,用于提取英语。我们介绍了报纸的头条语料库,描述了注释标签集和指南,并介绍了可作为基线的英语语言能力检测任务的CRF模型。呈现的作品是为西班牙ncwswire创建英语语言提取器的第一步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号