首页> 外文会议>IEEE International Conference on Big Data >Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach
【24h】

Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach

机译:从可比较新闻语料库中挖掘新闻事件:一种多属性邻近网络建模方法

获取原文

摘要

We present ProxiModel, a novel event mining framework for extracting high-quality structured event knowledge from large, redundant, and noisy news data sources. The proposed model differentiates itself from other approaches by modeling both the event correlation within each individual document as well as across the corpus. To facilitate this, we introduce the concept of a proximity-network, a novel space-efficient data structure to facilitate scalable event mining. This proximity network captures the corpus-level co-occurence statistics for candidate event descriptors, event attributes, as well as their connections. We probabilistically model the proximity network as a generative process with sparsity-inducing regularization. This allows us to efficiently and effectively extract high-quality and interpretable news events. Experiments on three different news corpora demonstrate that the proposed method is effective and robust at generating high-quality event descriptors and attributes. We briefly detail many interesting applications from our proposed framework such as news summarization, event tracking and multi-dimensional analysis on news. Finally, we explore a case study on visualizing the events for a Japan Tsunami news corpus and demonstrate ProxiModel’s ability to automatically summarize emerging news events.
机译:我们介绍了ProxiModel,这是一个新颖的事件挖掘框架,用于从大型,冗余和嘈杂的新闻数据源中提取高质量的结构化事件知识。所提出的模型通过对每个单独文档内以及整个语料库中的事件相关性进行建模,从而将自己与其他方法区分开来。为了促进这一点,我们引入了邻近网络的概念,这是一种新型的空间高效的数据结构,可促进可伸缩的事件挖掘。该邻近网络捕获候选事件描述符,事件属性及其连接的语料库级共现统计。我们概率地将邻近网络建模为具有稀疏性导致的正则化的生成过程。这使我们能够有效地提取高质量且可解释的新闻事件。在三种不同新闻语料库上的实验表明,该方法在生成高质量事件描述符和属性方面既有效又健壮。我们从提议的框架中简要详细介绍了许多有趣的应用程序,例如新闻摘要,事件跟踪和新闻的多维分析。最后,我们探索一个案例研究,以可视化日本海啸新闻语料库的事件,并展示ProxiModel能够自动总结新兴新闻事件的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号