Graph-based Density Peaks Ranking Approach for Extracting KeyPhrases (GDREK)

机译：基于图的密度峰值峰值排名方法，用于提取关键词（GDREK）

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Surprisingly, there are more than 1,500,000 articles found by google scholar search engine on keyphrase extraction (KE) have been published recently, 21,000 of them only in current year. This large number implies that researchers need to find more accurate and better performing models for KE from text as a subtask of text mining and summarization. This paper presents a novel design of KE. The model is composed of Graph-based Representation, sentence clustering and ranking based on Density peaks for KE in single or multi-documents (GDREK) which can be used further in text extractive summarization. The principle of GDREK is using graph model to represent text and then group and rank the sentences in a mutuality manner. In this model, sentence grouping and ranking proceeds by discovering the main topics of text and finding central sentences of each topic incrementally. In this incremental step, as the sentences are grouped based on the Graph-based Growing Self-Organizing Map (G-GSOM), they are ranked using Density Peaks (DP) concept according to a measure of similarity between sentences. Our similarity measure is based on shared phrases and Cosine function. Sentences are scored under the assumption that when a sentence has more similar sentences, it is considered more important (higher density) and more representative. Finally, the most frequent words or phrases in the sentences are selected as key phrases of the text. Experimental results show that our innovative technique extracts the most key phrases and words of two datasets and yields over 75% accuracy and from most sub-topics of text.

机译：令人惊讶的是，谷歌学者搜索引擎有超过1,500,000篇关于关键词的搜索引擎（Ke），最近发表了21,000人，只在本年度。这个大量的意味着研究人员需要为ke从文本找到更准确和更好的表现模型，作为文本挖掘和摘要的子任务。本文提出了柯的新颖设计。该模型由基于图形的表示，句子聚类和基于KE中的密度峰值的排序组成，用于单独或多文档（GDREK），其可以进一步用于文本提取摘要。 GDREK的原理使用图形模型来表示文本，然后以相互作用方式对句子进行排序。在此模型中，通过发现文本的主要主题并逐步查找每个主题的中央句子来进行句子分组和排序。在该增量步骤中，由于基于基于图形的生长自组织地图（G-GSOM）来分组句子，它们根据句子之间的相似性的测量使用密度峰值（DP）概念进行排序。我们的相似度措施是基于共享短语和余弦功能。在假设下，句子被评分，当句子有更类似的句子时，它被认为更重要（密度较高）和更具代表性。最后，选择句子中最常用的单词或短语作为文本的关键短语。实验结果表明，我们的创新技术提取了两个数据集最关键的短语和单词，并从大多数文本的大多数子主题和来自大多数子主题产生超过75％。

著录项

来源
《Palestinian International Conference on Electrical and Computer Engineering》|2019年|231p|共6页
会议地点
作者
Mahmoud Alfarra; Abdalfattah M. Alfarra; Ahmed Salahedden;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术及设备;
关键词

相似文献

外文文献
中文文献
专利

1. SemGraph: Extracting Keyphrases Following a Novel Semantic Graph-Based Approach [J] . Juan Martinez-Romo, Lourdes Araujo, Andres Duque Fernandez Journal of the American Society for Information Science . 2016,第1期

机译：SemGraph：根据一种新颖的基于语义图的方法提取关键短语
2. A Graph-based Approach of Automatic Keyphrase Extraction [J] . Yan Ying, Tan Qingping, Xie Qinzheng, Procedia Computer Science . 2017,第1期

机译：一种基于图的自动关键词提取方法
3. Ranking Sentences for Keyphrase Extraction: A Relational Data Mining Approach [J] . Michelangelo Ceci, Corrado Loglisci, Lucrezia Macchia Procedia Computer Science . 2014,第1期

机译：关键短语提取的排序句子：一种关系数据挖掘方法
4. Graph-based Density Peaks Ranking Approach for Extracting KeyPhrases (GDREK) [C] . Mahmoud Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden 2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering . 2019

机译：基于图的密度峰值排序方法以提取关键短语（GDREK）
5. Evaluation techniques and graph-based algorithms for automatic summarization and keyphrase extraction. [D] . Hamid, Fahmida. 2016

机译：自动汇总和关键短语提取的评估技术和基于图的算法。
6. Graph Peak Caller: Calling ChIP-seq peaks on graph-based reference genomes [O] . Ivar Grytten, Knut D. Rand, Alexander J. Nederbragt, 2019

机译：图峰调用者：在基于图的参考基因组上调用ChIP-seq峰
7. TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction [O] . Bougouin Adrien, Boudin Florian, Daille Béatrice 2013

机译：TopicRank：用于关键词提取的基于图形的主题排名

Graph-based Density Peaks Ranking Approach for Extracting KeyPhrases (GDREK)

摘要

著录项

相似文献

相关主题

期刊订阅