首页> 外文会议>2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering >Graph-based Density Peaks Ranking Approach for Extracting KeyPhrases (GDREK)
【24h】

Graph-based Density Peaks Ranking Approach for Extracting KeyPhrases (GDREK)

机译:基于图的密度峰值排序方法以提取关键短语(GDREK)

获取原文
获取原文并翻译 | 示例

摘要

Surprisingly, there are more than 1,500,000 articles found by google scholar search engine on keyphrase extraction (KE) have been published recently, 21,000 of them only in current year. This large number implies that researchers need to find more accurate and better performing models for KE from text as a subtask of text mining and summarization. This paper presents a novel design of KE. The model is composed of Graph-based Representation, sentence clustering and ranking based on Density peaks for KE in single or multi-documents (GDREK) which can be used further in text extractive summarization. The principle of GDREK is using graph model to represent text and then group and rank the sentences in a mutuality manner. In this model, sentence grouping and ranking proceeds by discovering the main topics of text and finding central sentences of each topic incrementally. In this incremental step, as the sentences are grouped based on the Graph-based Growing Self-Organizing Map (G-GSOM), they are ranked using Density Peaks (DP) concept according to a measure of similarity between sentences. Our similarity measure is based on shared phrases and Cosine function. Sentences are scored under the assumption that when a sentence has more similar sentences, it is considered more important (higher density) and more representative. Finally, the most frequent words or phrases in the sentences are selected as key phrases of the text. Experimental results show that our innovative technique extracts the most key phrases and words of two datasets and yields over 75% accuracy and from most sub-topics of text.
机译:令人惊讶的是,最近有1,500,000篇由Google学术搜索引擎发现的关于关键词提取(KE)的文章已经发表,仅当年有21,000篇。大量数据表明研究人员需要从文本中找到更准确,性能更好的KE模型,这是文本挖掘和汇总的子任务。本文提出了一种新颖的KE设计。该模型由基于图的表示,句子聚类和基于单个或多个文档中的KE的密度峰值的排序(GDREK)组成,可以进一步用于文本提取摘要中。 GDREK的原理是使用图形模型表示文本,然后以相互关联的方式对句子进行分组和排序。在此模型中,句子的分组和排名通过发现文本的主要主题并逐步找到每个主题的中心句子来进行。在此增量步骤中,由于根据基于图的增长自组织映射(G-GSOM)对句子进行分组,因此根据句子之间的相似性度量,使用密度峰值(DP)概念对句子进行排序。我们的相似性度量基于共享短语和余弦函数。在句子具有相似句子的情况下对句子评分,认为句子更重要(密度更高)且更具代表性。最后,选择句子中最常用的词或短语作为文本的关键短语。实验结果表明,我们的创新技术从两个文本子主题中提取了两个数据集的最关键短语和单词,并产生了75%以上的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号