...
首页> 外文期刊>Information Processing & Management >EdgeSumm: Graph-based framework for automatic text summarization
【24h】

EdgeSumm: Graph-based framework for automatic text summarization

机译:EdgeMumm:基于图形的自动文本摘要框架

获取原文
获取原文并翻译 | 示例
           

摘要

Searching the Internet for a certain topic can become a daunting task because users cannot read and comprehend all the resulting texts. Automatic Text summarization (ATS) in this case is clearly beneficial because manual summarization is expensive and time-consuming. To enhance ATS for single documents, this paper proposes a novel extractive graph-based framework "EdgeSumm" that relies on four proposed algorithms. The first algorithm constructs a new text graph model representation from the input document. The second and third algorithms search the constructed text graph for sentences to be included in the candidate summary. When the resulting candidate summary still exceeds a user-required limit, the fourth algorithm is used to select the most important sentences. EdgeSumm combines a set of extractive ATS methods (namely graph-based, statistical-based, semantic-based, and centrality-based methods) to benefit from their advantages and overcome their individual drawbacks. EdgeSumm is general for any document genre (not limited to a specific domain) and unsupervised so it does not require any training data. The standard datasets DUC2001 and DUC2002 are used to evaluate EdgeSumm using the widely used automatic evaluation tool: Recall-Oriented Understudy for Gisting Evaluation (ROUGE). EdgeSumm gets the highest ROUGE scores on DUC2001. For DUC2002, the evaluation results show that the proposed framework outperforms the state-of-the-art ATS systems by achieving improvements of 1.2% and 4.7% over the highest scores in the literature for the metrics of ROUGE-1 and ROUGE-L respectively. In addition, EdgeSumm achieves very competitive results for the metrics of ROUGE-2 and R0UGE-SU4.
机译:在Internet上搜索某个主题可能会成为一个令人生畏的任务,因为用户无法读取和理解所有结果的文本。在这种情况下,自动文本摘要(ATS)显然是有益的,因为手动摘要昂贵且耗时。为了增强单一文件,本文提出了一种基于新的基于提取图形的框架“EdgeMumb”,其依赖于四种提出的算法。第一算法构造了从输入文档的新文本图模型表示。第二和第三算法搜索构建的文本图表以用于候选摘要中包含的句子。当生成的候选摘要仍然超过用户所需的限制时,第四算法用于选择最重要的句子。 Edgeumm umm结合了一套提取ATS方法(即基于图形,基于统计,基于语义的和基于中心的方法),以受益于其优势并克服各自的缺点。 EdgeMumm对于任何文档类型(不限于特定域)和无监督,所以它不需要任何培训数据。标准数据集DUC2001和DUC2002用于使用广泛使用的自动评估工具评估EDGEUMUMM:召回导向的直接评估(Rouge)。 Edgeumm umm在DUC2001上获得了最高的胭脂分数。对于DUC2002,评估结果表明,该框架通过分别在文献中的最高分别为Rouge-1和Rouge-L的度量的最高评分中实现了1.2%和4.7%的提高,所提出的框架优于最先进的ATS系统。此外,EdgeMumom为Rouge-2和R0UGE-SU4的指标实现了非常有竞争力的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号