首页> 外文会议>International conference on language resources and evaluation >An Empirical Study of the Occurrence and Co-Occurrence of Named Entities in Natural Language Corpora
【24h】

An Empirical Study of the Occurrence and Co-Occurrence of Named Entities in Natural Language Corpora

机译:自然语言语料库中命名实体的发生与共现的实证研究

获取原文

摘要

Named Entities (NEs) that occur in natural language text are important especially due to the advent of social media, and they play a critical role in the development of many natural language technologies. In this paper, we systematically analyze the patterns of occurrence and co-occurrence of NEs in standard large English news corpora - providing valuable insight for the understanding of the corpus, and subsequently paving way for the development of technologies that rely critically on handling NEs. We use two distinctive approaches: normal statistical analysis that measure and report the occurrence patterns of NEs in terms of frequency, growth, etc., and a complex networks based analysis that measures the co-occurrence pattern in terms of connectivity, degree-distribution, small-world phenomenon, etc. Our analysis indicates that: (ⅰ)NEs form an open-set in corpora and grow linearly, (ⅱ) presence of a kernel and peripheral NE's, with the large periphery occurring rarely, and (ⅲ) a strong evidence of small-world phenomenon. Our findings may suggest effective ways for construction of NE lexicons to aid efficient development of several natural language technologies.
机译:在自然语言文本中出现的命名实体(NE)非常重要,特别是由于社交媒体的出现,它们在许多自然语言技术的发展中起着至关重要的作用。在本文中,我们系统地分析了标准大型英语新闻语料库中网元的出现和共现模式-为理解语料库提供了宝贵的见识,并为关键地依赖于处理网元的技术的发展铺平了道路。我们使用两种独特的方法:以频率,增长等方式衡量和报告NE发生模式的正常统计分析,以及以连通性,程度分布,我们的分析表明:(ⅰ)NE在语料库中形成一个开放集并线性增长,(ⅱ)内核和外围NE的存在,而大外围很少出现,并且(ⅲ)a小世界现象的有力证据。我们的发现可能为构建NE词典提供了有效的方法,以帮助多种自然语言技术的有效发展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号