社交网络数据是稀疏和嘈杂的,并伴有大量的无意义话题.传统突发话题发现方法无法解决社交网络短文本稀疏性问题,并需要复杂的后处理过程.为了解决上述问题,提出一种基于循环神经网络(RNN, recurrent neural network)和主题模型的突发话题发现(RTM-SBTD)方法.首先,综合RNN和逆序文档频率(IDF, inverse document frequency)构建权重先验来学习词的关系,同时通过构建词对解决短文本稀疏性问题.其次,模型中引入针板先验(spike and slab)来解耦突发话题分布的稀疏和平滑.最后,引入词的突发性来区分建模普通话题和突发话题,实现突发话题自动发现.实验结果表明与现有的主流突发话题发现方法相比,所提 RTM-SBTD 方法在多种评价指标上优于对比算法.%The data is noisy and diverse, with a large number of meaningless topics in social network. The traditional method of bursty topic discovery cannot solve the sparseness problem in social network, and require complicated post-processing. In order to tackle this problem, a bursty topic discovery method based on recurrent neural network and topic model was proposed. Firstly, the weight prior based on RNN and IDF were constructed to learn the relationship between words. At the same time, the word pairs were constructed to solve the sparseness problem. Secondly, the "spike and slab" prior was introduced to decouple the sparsity and smoothness of the bursty topic distribution. Finally, the burstiness of words were leveraged to model the bursty topic and the common topic, and automatically discover the bursty topics. To evaluate the effectiveness of proposed method, the various experiments were conducted. Both qualitative and quantitative evaluations demonstrate that the proposed RTM-SBTD method outperforms favorably against several state-of-the-art methods.
展开▼