【24h】

Krimping texts for better summarization

机译:拼写文本以进行更好的总结

获取原文

摘要

Automated text summarization is aimed at extracting essential information from original text and presenting it in a minimal, often predefined, number of words. In this paper, we introduce a new approach for unsupervised extractive summarization, based on the Minimum Description Length (MDL) principle, using the Krimp dataset compression algorithm (Vreeken et al., 2011). Our approach represents a text as a transactional dataset, with sentences as transactions, and then describes it by itemsets that stand for frequent sequences of words. The summary is then compiled from sentences that compress (and as such, best describe) the document. The problem of summarization is reduced to the maximal coverage, following the assumption that a summary that best describes the original text, should cover most of the word sequences describing the document. We solve it by a greedy algorithm and present the evaluation results.
机译:自动文本摘要旨在从原始文本中提取基本信息并将其呈现在最小,通常预定义的单词中。在本文中,我们使用KRIMP DataSet压缩算法(Vreeken等,2011),介绍了一种用于无监督的提取总结的新方法(MDL)原则(Vreeken等,2011)。我们的方法表示作为事务数据集的文本,其中句子作为事务,然后通过代表频繁的单词序列的项目集来描述它。然后将摘要从压缩(以及最佳描述)文档中的句子编译。在最能描述原始文本的摘要之后,概述的问题减少到最大覆盖范围,应该涵盖描述文档的大多数单词序列。我们通过贪婪的算法解决并提出评估结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号