首页> 外文会议>10th SIGHUM workshop on language technology for cultural heritage, social sciences, and humanities >Brave New World: Uncovering Topical Dynamics in the ACL Anthology Reference Corpus Using Term Life Cycle Information
【24h】

Brave New World: Uncovering Topical Dynamics in the ACL Anthology Reference Corpus Using Term Life Cycle Information

机译:勇敢的新世界:使用术语生命周期信息在ACL选集参考语料库中发现主题动态

获取原文
获取原文并翻译 | 示例

摘要

One of the main interests in the analysis of large document collections is to discover domains of discourse that are still actively developing, growing in interest and relevance, at a given point in time, and to distinguish them from those topics that are in stagnation or decline. The present paper describes a terminologically inspired approach to this kind of task. The inputs to the method are a corpus spanning several decades of research in computational linguistics and a set of single-word terms that frequently occur in that corpus. The diachronic development of these terms is modelled by means of term life cycle information, namely the parameters relative frequency and productivity. In a second step, k-means clustering is used to identify groups of terms with similar development patterns. The paper describes a mathematical approach to modelling term productivity and discusses what kind of information can be obtained from this measure. The results of the clustering experiment are promising and well motivate future research.
机译:分析大型文档集合的主要兴趣之一是发现在特定时间点仍在积极发展,兴趣和相关性不断增长的话语领域,并将它们与处于停滞或衰落的主题区分开来。本文介绍了一种受术语启发的方法来完成此类任务。该方法的输入是一个涵盖了数十年的计算语言学研究的语料库以及在该语料库中经常出现的一组单词词。这些术语的历时发展是通过术语生命周期信息(即参数相对频率和生产率)建模的。第二步,使用k均值聚类来识别具有相似开发模式的术语组。本文介绍了一种数学模型来对术语生产率进行建模,并讨论了可以从该度量中获得什么样的信息。聚类实验的结果是有希望的,并有力地推动了未来的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号