Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text

Chen Junjie; Hou Hongxu; Gao Jing

首页> 外文期刊>ACM transactions on Asian language information processing >Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text

【24h】

Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text

机译：基于图形的基于图形关键字提取的重要因素

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Keywords are considered to be important words in the text and can provide a concise representation of the text. With the surge of unlabeled short text on the Internet, automatic keyword extraction task has proven useful in other information processing applications. Graph-based approaches are prevalent unsupervised models for this task. However, most of these methods emphasize the importance of the relation between words without considering other importance factors. Furthermore, when measuring the importance of a word in a text, the damping factor is set to 0.85 following PageRank. To the best of our knowledge, there is no existing work investigating the impact of the damping factor on the keyword extraction task. In addition, there are few publicly available labeled Chinese short text datasets for this task. In this article, we investigate the importance parts of words in a given document and propose an improved graph-based method for keyword extraction from short documents. Moreover, we analyze the impact of importance factors on performance. We also provide annotated long and short Chinese datasets for this task. The model is performed on Chinese and English datasets, and results show that our model obtains improvements in performance over the previous unsupervised models on short documents. Comparative experiments show that the damping factor is related to the text length, which is neglected in traditional methods.

机译：关键字被认为是文本中的重要词语，并且可以提供文本的简明表示。随着Internet上的未标记的短文本的激增，自动关键字提取任务已证明在其他信息处理应用程序中有用。基于图形的方法是此任务的普遍无监督模型。然而，大多数方法都强调了单词之间关系的重要性，而不考虑其他重要因素。此外，在测量文本中的单词的重要性时，PageRank之后阻尼因子设置为0.85。据我们所知，没有现有的工作调查阻尼因子对关键字提取任务的影响。此外，少数公开可用标记为中文短文本数据集，用于此任务。在本文中，我们研究了给定文档中的单词的重要性部分，并提出了一种改进的基于图形的方法，用于短文档的关键字提取。此外，我们分析了重要因素对性能的影响。我们还为此任务提供注释的长和中文数据集。该模型是关于中文和英文数据集的执行，结果表明，我们的模型在短文档上通过先前无监督模型的性能提高。比较实验表明阻尼因子与文本长度有关，在传统方法中被忽略。

著录项

来源
《ACM transactions on Asian language information processing》 |2020年第5期|63.1-63.15|共15页
作者
Chen Junjie; Hou Hongxu; Gao Jing;
展开▼
作者单位

Inner Mongolia Univ Coll Comp Sci 235 West Univ Rd Hohhot 010021 Inner Mongolia Peoples R China|Inner Mongolia Agr Univ Coll Comp Sci & Informat Engn 306 Zhao Wuda Rd Hohhot 010018 Inner Mongolia Peoples R China;

Inner Mongolia Univ Coll Comp Sci 235 West Univ Rd Hohhot 010021 Inner Mongolia Peoples R China;

Inner Mongolia Agr Univ Coll Comp Sci & Informat Engn 306 Zhao Wuda Rd Hohhot 010018 Inner Mongolia Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Short text; keyword extraction; importance rank;

机译：短文本;关键词提取;重要性等级;

相似文献

外文文献
中文文献
专利

1. Keyword Extraction From Chinese Text Based On Multidimensional Weighted Features [J] . YANG JIAN Journal of digital information management . 2016,第3期

机译：基于多维加权特征的中文文本关键词提取
2. Movie Title Keywords: A Text Mining and Exploratory Factor Analysis of Popular Movies in the United States and China [J] . Xiao Xingyao, Cheng Yihong, Kim Jong-Min Journal of Risk and Financial Management . 2021,第2期

机译：电影标题关键词：美国和中国流行电影的文本挖掘和探索性因素分析
3. Study on Chinese Webpage Keyword Extraction based on Multiple Index Factors [J] . 无国际英语教育研究：英文版 . 2013,第012期

机译：基于多个指标因子的中文网页关键词提取研究
4. Improved Term Weighting Factors for Keyword Extraction in Hierarchical Category Structure and Thai Text Classification [C] . Boonthida Chiraratanasopha, Thanaruk Theeramunkong, Salin Boonbrahm International Symposium on Artificial Intelligence and Natural Language Processing . 2019

机译：分层类别结构和泰语文本分类的关键字提取的改进术语加权因子
5. Identifying the gist of conversational text: Automatic keyword extraction and summarization. [D] . Liu, Fei. 2011

机译：识别对话文本的要点：自动关键词提取和汇总。
6. FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data [O] . Noman Tahir, Muhammad Asif, Shahbaz Ahmad, 2021

机译：FNG-IE：从学术大数据的关键字提取的基于基于图的基于图形方法
7. The Research of Chinese Short-text Classification Based on Domain Keyword Set Extension and HowNet [O] . Xiangdong Li, Fan Gao, Cong Ding 2016

机译：基于域关键字集扩展和HONDET的中文短文本分类研究

Inside Importance Factors of Graph-Based Keyword Extraction on Chinese Short Text

摘要

著录项

相似文献

相关主题

期刊订阅