...
首页> 外文期刊>BMC Bioinformatics >Clustering cliques for graph-based summarization of the biomedical research literature
【24h】

Clustering cliques for graph-based summarization of the biomedical research literature

机译:用于基于图的生物医学研究文献综述的聚类集团

获取原文
           

摘要

Background Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). Results SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. Conclusions For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively.
机译:背景技术基于图的概念越来越多地用于生物医学数据挖掘和知识发现任务中。在本文中,我们提出了一种集团聚类方法来自动汇总从PubMed引文(标题和摘要)产生的语义谓词图。结果SemRep用于从PubMed搜索返回的引文中提取语义谓词。从频繁出现的谓词中识别出集团,这些谓词具有通过程度中心性过滤的高度关联的论点。摘要中包含的主题通过基于群体之间共享的通用论证的层次聚类算法进行识别。将生成的摘要中聚类的有效性与Silhouette生成的基线进行比较,以了解内聚性,分离性和总体有效性。还将主题标签与主要MeSH标题产生的参考标准进行了比较。结论对于测试数据集中的11个主题,系统摘要中群集的总体有效性比基线好10%(43%对33%)。与MeSH标题的参考标准相比,召回率,精确度和F分数分别为0.64、0.65和0.65。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号