首页> 外文期刊>Information Processing & Management >Text segmentation: A topic modeling perspective
【24h】

Text segmentation: A topic modeling perspective

机译:文本分割:主题建模的角度

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of two unsupervised topic models, latent Dirichlet allocation (LDA) and multinomial mixture (MM), to segment a text into semantically coherent parts. The proposed topic model based approaches consistently outperform a standard baseline method on several datasets. A major benefit of the proposed LDA based approach is that along with the segment boundaries, it outputs the topic distribution associated with each segment. This information is of potential use in applications such as segment retrieval and discourse analysis. However, the proposed approaches, especially the LDA based method, have high computational requirements. Based on an analysis of the dynamic programming (DP) algorithm typically used for segmentation, we suggest a modification to DP that dramatically speeds up the process with no loss in performance. The proposed modification to the DP algorithm is not specific to the topic models only; it is applicable to all the algorithms that use DP for the task of text segmentation.
机译:本文从主题建模的角度来探讨文本分割的任务。我们研究了使用两个无监督主题模型(潜在狄利克雷分配(LDA)和多项式混合(MM))将文本分割成语义上一致的部分。所提出的基于主题模型的方法在多个数据集上始终优于标准基线方法。提出的基于LDA的方法的主要好处是,它与细分受众群边界一起输出与每个细分受众群相关的主题分布。此信息可能在诸如段检索和语篇分析之类的应用程序中潜在使用。但是,所提出的方法,特别是基于LDA的方法,具有很高的计算要求。在分析通常用于分段的动态编程(DP)算法的基础上,我们建议对DP进行修改,以在不损失性能的情况下显着加快处理速度。所提出的对DP算法的修改不仅仅针对主题模型;而是针对主题模型。它适用于所有使用DP进行文本分割任务的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号