...
首页> 外文期刊>Nucleic Acids Research >HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks
【24h】

HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

机译:HIPMCL:大型网络的Markov聚类算法的高性能并行实现

获取原文
获取原文并翻译 | 示例
           

摘要

Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein-protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL's scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of similar to 70 million nodes with similar to 68 billion edges in similar to 2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.
机译:生物网络捕获相关实体的结构或功能性质,例如分子,蛋白质或基因。特征示例是基因表达网络或蛋白质 - 蛋白质相互作用网络,其保持有关功能性亲和力或结构相似性的信息。由于越来越多的生物数据,这些网络的规模已经扩展。虽然已经提出了各种聚类算法来查找高度连接的区域,但Markov聚类(MCL)是群集序列相似性或表达网络的最成功的方法之一。尽管它受欢迎,但由于高运行时间和内存需求,MCL对集群大型数据集的可扩展性仍然是瓶颈。这里,我们呈现高性能MCL(HIPMCL),可以在分布式存储器上运行的原始MCL算法的并行实现。我们表明HIPMCL可以有效地利用2000个计算节点和集群类似于7000万节的网络,其与680亿边缘类似于2.4小时。通过利用分布式内存环境,HIPMCL集群大规模网络比MCL更快多个数量级,并使甚至更大的网络群集。 HIPMCL基于MPI和OpenMP,并在修改的BSD许可证下自由提供。

著录项

  • 来源
    《Nucleic Acids Research》 |2018年第6期|共11页
  • 作者单位

    Lawrence Berkeley Natl Lab Computat Res Div 1 Cyclotron Rd Berkeley CA 94720 USA;

    US DOE Joint Genome Inst Lawrence Berkeley Natl Lab 2800 Mitchell Dr Walnut Creek CA 94598 USA;

    Ctr Res &

    Technol Hellas Biol Computat &

    Proc Lab Chem Proc &

    Energy Resources Inst Thessaloniki 57001 Greece;

    US DOE Joint Genome Inst Lawrence Berkeley Natl Lab 2800 Mitchell Dr Walnut Creek CA 94598 USA;

    Lawrence Berkeley Natl Lab Computat Res Div 1 Cyclotron Rd Berkeley CA 94720 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物化学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号