首页> 外文会议>9th International conference on language resources and evaluation >The N2 corpus: A semantically annotated collection of Islamist extremist stories
【24h】

The N2 corpus: A semantically annotated collection of Islamist extremist stories

机译:N2语料库:伊斯兰极端主义故事的语义注释集合

获取原文

摘要

We describe the N2 (Narrative Networks) Corpus, a new language resource. The corpus is unique in three important ways. First, every text in the corpus is a story, which is in contrast to other language resources that may contain stories or story-like texts, but are not specifically curated to contain only stories. Second, the unifying theme of the corpus is material relevant to Islamist Extremists, having been produced by or often referenced by them. Third, every text in the corpus has been annotated for 14 layers of syntax and semantics, including: referring expressions and co-reference; events, time expressions, and temporal relationships; semantic roles; and word senses. In cases where analyzers were not available to do high-quality automatic annotations, layers were manually double-annotated and adjudicated by trained annotators. The corpus comprises 100 texts and 42,480 words. Most of the texts were originally in Arabic but all are provided in English translation. We explain the motivation for constructing the corpus, the process for selecting the texts, the detailed contents of the corpus itself, the rationale behind the choice of annotation layers, and the annotation procedure.
机译:我们描述了一种新的语言资源N2(叙事网络)语料库。语料库在三个重要方面具有独特性。首先,语料库中的每个文本都是一个故事,这与可能包含故事或​​类似故事的文本的其他语言资源形成了鲜明对比,但并非专门策划仅包含故事。其次,语料库的统一主题是与伊斯兰极端主义者有关的材料,由他们产生或经常引用。第三,语料库中的每个文本都被注释了14层语法和语义,包括:引用表达式和共同引用;事件,时间表达和时间关系;语义角色;和单词感官。如果无法使用分析仪来进行高质量的自动注释,则可以对图层手动进行双重注释,并由训练有素的注释者进行裁定。语料库包括100个文本和42,480个单词。大部分文本最初是阿拉伯语,但全部以英文翻译提供。我们解释了构建语料库的动机,选择文本的过程,语料库本身的详细内容,选择注释层的基本原理以及注释过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号