首页> 外文会议>International Conference on Computing, Communication and Networking Technologies >Graph Based Keyword Extraction for Similarity Identification among Born-Digital News Contents
【24h】

Graph Based Keyword Extraction for Similarity Identification among Born-Digital News Contents

机译:基于图的关键词提取在数字新闻内容中的相似性识别

获取原文

摘要

Increasing influence of internet has led to huge amount of born-digital news articles being published on the internet. It is also becoming increasingly difficult to forage through the vast warehouse of these documents for preventing duplicity. Keywords are the most salient words in any textual document. We have introduced a graph-based approach for keyword extraction, using term co-occurrence in the textual news articles and integrating weighted closeness centrality (CC) with weighted clustering coefficient (WC). We have also proposed a metric namely co-occurrence index (CI) based on the extracted keywords for finding the amount of similarity between any two textual news articles. Our proposed method is independent of the ‘bag-of-word model’ and has shown significant performance improvement over the other existing methods.
机译:互联网的影响力日益增强,导致大量的数字新闻新闻在互联网上发表。为了防止重复,在这些文件的庞大仓库中觅食也变得越来越困难。关键字是任何文本文档中最突出的单词。我们引入了一种基于图的关键字提取方法,该方法在文本新闻文章中使用术语共现,并将加权的紧密度中心度(CC)与加权的聚类系数(WC)集成在一起。我们还提出了一种度量标准,即基于提取的关键字的共现指数(CI),以查找任意两个文本新闻文章之间的相似度。我们提出的方法独立于“单词袋模型”,并且与其他现有方法相比,已显示出显着的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号