【24h】

Sentence Clustering using PageRank Topic Model

机译:使用PageRank主题模型的句子聚类

获取原文

摘要

The clusters of review sentences on the viewpoints from the products' evaluation can be applied to various use. The topic models, for example Unigram Mixture (UM), can be used for this task. However, there are two problems. One problem is that topic models depend on the randomly-initialized parameters and computation results are not consistent. The other is that the number of topics has to be set as a preset parameter. To solve these problems, we introduce PageRank Topic Model (PRIM), that approximately estimates multinomial distributions over topics and words in a vocabulary using network structure analysis methods to Word Co-occurrence Graphs. In PRTM, an appropriate number of topics is estimated using the Newman method from a Word Co-occurrence Graph. Also, PRTM achieves consistent results because multinomial distributions over words in a vocabulary are estimated using PageRank and a multinomial distribution over topics is estimated as a convex quadratic programming problem. Using two review datasets about hotels and cars, we show that PRTM achieves consistent results in sentence clustering and an appropriate estimation of the number of topics for extracting the viewpoints from the products' evaluation.
机译:从产品的评估观点评述语句的集群可以适用于各种用途。该主题模型,例如一元模型混合物(UM),可以用于此任务。但是,有两个问题。一个问题是,主题模型依赖于随机初始化参数和计算结果并不一致。另一个是主题的数量必须被设置为预设参数。为了解决这些问题,我们引入PageRank的主题模型(PRIM),大约有超过估计使用网络结构分析方法到Word共生图形的词汇主题和单词多项分布。在PRTM,采用从Word共生格拉夫纽曼方法估计主题的适当数量。此外,PRTM达到一致的结果,因为多项分布在单词词汇使用PageRank的估计,并且在主题多项分布估计为凸二次规划问题。使用约酒店和汽车两种审核数据集,我们表明,PRTM实现了一句集群和主题的数量从产品的评价提取的观点适当的估计一致的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号