首页> 外文会议>International conference on Asian digital libraries >Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis
【24h】

Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis

机译:中文文本摘要使用培训摘要和潜在语义分析

获取原文

摘要

In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively.
机译:在本文中,提出了两种新的方法来从文档中提取重要句子以创建其摘要。第一种是使用特征分析的基于语料库的方法。它带来了三个新的想法:1)采用排名的位置来强调句子位置的重要性,2)重塑单词单位实现高度重视的准确性,以及3)通过遗传算法训练得分功能以获得遗传算法特征权重的合适组合。第二种方法结合了潜在语义分析和文本关系图的思想来解释文档的概念结构。两种方法都适用于中国文本摘要。通过使用由来自新台湾每周关于政治文章组成的数据语料库来评估这两种方法,并且当压缩比为30%时,分别实现了52.0%和45.6%的平均召回。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号