首页> 外文期刊>ACM transactions on Asian language information processing >Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text
【24h】

Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text

机译:基于图形的基于图形关键字提取的重要因素

获取原文
获取原文并翻译 | 示例
           

摘要

Keywords are considered to be important words in the text and can provide a concise representation of the text. With the surge of unlabeled short text on the Internet, automatic keyword extraction task has proven useful in other information processing applications. Graph-based approaches are prevalent unsupervised models for this task. However, most of these methods emphasize the importance of the relation between words without considering other importance factors. Furthermore, when measuring the importance of a word in a text, the damping factor is set to 0.85 following PageRank. To the best of our knowledge, there is no existing work investigating the impact of the damping factor on the keyword extraction task. In addition, there are few publicly available labeled Chinese short text datasets for this task. In this article, we investigate the importance parts of words in a given document and propose an improved graph-based method for keyword extraction from short documents. Moreover, we analyze the impact of importance factors on performance. We also provide annotated long and short Chinese datasets for this task. The model is performed on Chinese and English datasets, and results show that our model obtains improvements in performance over the previous unsupervised models on short documents. Comparative experiments show that the damping factor is related to the text length, which is neglected in traditional methods.
机译:关键字被认为是文本中的重要词语,并且可以提供文本的简明表示。随着Internet上的未标记的短文本的激增,自动关键字提取任务已证明在其他信息处理应用程序中有用。基于图形的方法是此任务的普遍无监督模型。然而,大多数方法都强调了单词之间关系的重要性,而不考虑其他重要因素。此外,在测量文本中的单词的重要性时,PageRank之后阻尼因子设置为0.85。据我们所知,没有现有的工作调查阻尼因子对关键字提取任务的影响。此外,少数公开可用标记为中文短文本数据集,用于此任务。在本文中,我们研究了给定文档中的单词的重要性部分,并提出了一种改进的基于图形的方法,用于短文档的关键字提取。此外,我们分析了重要因素对性能的影响。我们还为此任务提供注释的长和中文数据集。该模型是关于中文和英文数据集的执行,结果表明,我们的模型在短文档上通过先前无监督模型的性能提高。比较实验表明阻尼因子与文本长度有关,在传统方法中被忽略。

著录项

  • 来源
  • 作者单位

    Inner Mongolia Univ Coll Comp Sci 235 West Univ Rd Hohhot 010021 Inner Mongolia Peoples R China|Inner Mongolia Agr Univ Coll Comp Sci & Informat Engn 306 Zhao Wuda Rd Hohhot 010018 Inner Mongolia Peoples R China;

    Inner Mongolia Univ Coll Comp Sci 235 West Univ Rd Hohhot 010021 Inner Mongolia Peoples R China;

    Inner Mongolia Agr Univ Coll Comp Sci & Informat Engn 306 Zhao Wuda Rd Hohhot 010018 Inner Mongolia Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Short text; keyword extraction; importance rank;

    机译:短文本;关键词提取;重要性等级;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号