首页> 外文期刊>International journal of grid and high performance computing >Towards High Performance Text Mining: A TextRank-based Method for Automatic Text Summarization
【24h】

Towards High Performance Text Mining: A TextRank-based Method for Automatic Text Summarization

机译:迈向高性能文本挖掘:一种基于TextRank的自动文本摘要方法

获取原文
获取原文并翻译 | 示例
           

摘要

As a typical unsupervised learning method, the TextRank algorithm performs well for large-scale text mining, especially for automatic summarization or keyword extraction. However, TextRank only considers the similarities between sentences in the processes of automatic summarization and neglects information about text structure and context. To overcome these shortcomings, the authors propose an improved highly-scalable method, called iTextRank. When building a TextRank graph in their new method, the authors compute sentence similarities and adjust the weights of nodes by considering statistical and linguistic features, such as similarities in titles, paragraph structures, special sentences, sentence positions and lengths. Their analysis shows that the time complexity of iTextRank is comparable with TextRank. More importantly, two experiments show that iTextRank has a higher accuracy and lower recall rate than TextRank, and it is as effective as several popular online automatic summarization systems.
机译:作为一种典型的无监督学习方法,TextRank算法在大规模文本挖掘中表现良好,尤其是对于自动摘要或关键字提取而言。但是,TextRank仅在自动摘要过程中考虑句子之间的相似性,而忽略了有关文本结构和上下文的信息。为了克服这些缺点,作者提出了一种改进的高度可扩展的方法,称为iTextRank。当使用新方法构建TextRank图时,作者通过考虑统计和语言特征(例如标题,段落结构,特殊句子,句子位置和长度的相似性)来计算句子相似度并调整节点的权重。他们的分析表明,iTextRank的时间复杂度与TextRank相当。更重要的是,两个实验表明,iTextRank比TextRank具有更高的准确性和更低的召回率,并且与几种流行的在线自动摘要系统一样有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号