首页> 外文会议>Palestinian International Conference on Electrical and Computer Engineering >Graph-based Density Peaks Ranking Approach for Extracting KeyPhrases (GDREK)
【24h】

Graph-based Density Peaks Ranking Approach for Extracting KeyPhrases (GDREK)

机译:基于图的密度峰值峰值排名方法,用于提取关键词(GDREK)

获取原文

摘要

Surprisingly, there are more than 1,500,000 articles found by google scholar search engine on keyphrase extraction (KE) have been published recently, 21,000 of them only in current year. This large number implies that researchers need to find more accurate and better performing models for KE from text as a subtask of text mining and summarization. This paper presents a novel design of KE. The model is composed of Graph-based Representation, sentence clustering and ranking based on Density peaks for KE in single or multi-documents (GDREK) which can be used further in text extractive summarization. The principle of GDREK is using graph model to represent text and then group and rank the sentences in a mutuality manner. In this model, sentence grouping and ranking proceeds by discovering the main topics of text and finding central sentences of each topic incrementally. In this incremental step, as the sentences are grouped based on the Graph-based Growing Self-Organizing Map (G-GSOM), they are ranked using Density Peaks (DP) concept according to a measure of similarity between sentences. Our similarity measure is based on shared phrases and Cosine function. Sentences are scored under the assumption that when a sentence has more similar sentences, it is considered more important (higher density) and more representative. Finally, the most frequent words or phrases in the sentences are selected as key phrases of the text. Experimental results show that our innovative technique extracts the most key phrases and words of two datasets and yields over 75% accuracy and from most sub-topics of text.
机译:令人惊讶的是,谷歌学者搜索引擎有超过1,500,000篇关于关键词的搜索引擎(Ke),最近发表了21,000人,只在本年度。这个大量的意味着研究人员需要为ke从文本找到更准确和更好的表现模型,作为文本挖掘和摘要的子任务。本文提出了柯的新颖设计。该模型由基于图形的表示,句子聚类和基于KE中的密度峰值的排序组成,用于单独或多文档(GDREK),其可以进一步用于文本提取摘要。 GDREK的原理使用图形模型来表示文本,然后以相互作用方式对句子进行排序。在此模型中,通过发现文本的主要主题并逐步查找每个主题的中央句子来进行句子分组和排序。在该增量步骤中,由于基于基于图形的生长自组织地图(G-GSOM)来分组句子,它们根据句子之间的相似性的测量使用密度峰值(DP)概念进行排序。我们的相似度措施是基于共享短语和余弦功能。在假设下,句子被评分,当句子有更类似的句子时,它被认为更重要(密度较高)和更具代表性。最后,选择句子中最常用的单词或短语作为文本的关键短语。实验结果表明,我们的创新技术提取了两个数据集最关键的短语和单词,并从大多数文本的大多数子主题和来自大多数子主题产生超过75%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号