cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining

机译：CMFSM：常亮的子图挖掘，可扩展的CPU-MIC协调药物发现工具

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background: Frequent subgraphs mining is a significant problem in many practical domains. The solution of this kind of problem can particularly used in some large-scale drug molecular or biological libraries to help us find drugs or core biological structures rapidly and predict toxicity of some unknown compounds. The main challenge is its efficiency, as (i) it is computationally intensive to test for graph isomorphisms, and (ii) the graph collection to be mined and mining results can be very large.Existing solutions often require days to derive mining results from biological networks even with relative low support threshold. Also, the whole mining results always cannot be stored in single node memory.Results: In this paper, we implement a parallel acceleration tool for classical frequent subgraph mining algorithm called cmFSM. The core idea is to employ parallel techniques to parallelize extension tasks, so as to reduce computation time. On the otherhand, we employ multi-node strategy to solve the problem of memory constraints. The parallel optimization of cmFSM is carried out on three different levels, including the fine-grained OpenMP parallelization on single node, multi-node multi-process parallel acceleration and CPU-MIC collaborated parallel optimization.Conclusions:. Evaluation results show that cmFSM clearly outperforms the existing state-of-the-art miners even if we only hold a few parallel computing resources. It means that cmFSM provides a practical solution to frequent subgraph mining problem withhuge number of mining results. Specifically, our solution is up to one order of magnitude faster than the best CPU-based approach on single node and presents a promising scalability of massive mining tasks in multi-node scenario. More source code are available at:Source Code: https://github.com/ysycloucVcmFSM.

机译：背景：频繁的子图挖掘是许多实际域中的重要问题。这种问题的解决方案可以特别用于一些大规模的药物分子或生物文库中，以帮助我们快速发现药物或核心生物结构，并预测一些未知化合物的毒性。主要挑战是它的效率，因为（i）它是计算图形同构的计算，（ii）待开采和采矿结果的图表集合可以非常大。提出解决方案通常需要几天来源于生物学的挖掘结果网络即使具有相对低的支持阈值。此外，整个挖掘结果始终不能存储在单节点内存中。结果：本文实施了一个并行加速工具，用于称为CMFSM的经典频繁子图挖掘算法。核心思想是采用并行技术并行化扩展任务，以减少计算时间。另一方面，我们采用多元节点策略来解决内存约束的问题。 CMFSM的并行优化在三个不同的级别上进行，包括单个节点上的细粒度OpenMP并行化，多节点多进程并行加速和CPU-MIC协作并行优化。结论：。评估结果表明，即使我们只持有一些并行计算资源，CMFSM也明显优于现有的最先进的矿工。这意味着CMFSM为频繁的Subgraph挖掘问题提供了一种实际的解决方案，随着挖掘次数。具体而言，我们的解决方案比单个节点上的最佳CPU的方法快一级数量级，并且在多节点场景中提出了大量挖掘任务的有希望的可扩展性。更多源代码可用于：源代码：https://github.com/ysycloucvcmfsm。

著录项

来源
《Asia-Pacific Bioinformatics Conference》|2018年|406 p. :|共13页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q811.4-532;
关键词
Frequent subgraph mining; Bioinformatics; Memory constraints; Isomorphism; Many integrated Core (MIC);

机译：频繁的子图挖掘;生物信息学;记忆约束;同构;许多综合核心（MIC）;

相似文献

外文文献
中文文献
专利

1. cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining [J] . Shunyun Yang, Runxin Guo, Rui Liu, BMC Bioinformatics . 2018,第4期

机译：cmFSM：通过频繁的子图挖掘可扩展的CPU-MIC协作式毒品发现工具
2. RASMA: a reverse search algorithm for mining maximal frequent subgraphs [J] . Saeed Salem, Mohammed Alokshiya, Mohammad Al Hasan BioData Mining . 2021,第1期

机译：RASMA：用于采矿最大频繁子图的反向搜索算法
3. TIPTAP: Approximate Mining of Frequent k-Subgraph Patterns in Evolving Graphs [J] . Nasir Muhammad Anis Uddin, Aslay Cigdem, Morales Gianmarco De Francisci, ACM transactions on knowledge discovery from data . 2021,第3期

机译：TIPTAP：在不断发展的图表中常见的k子图模式的近似挖掘
4. cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining [C] . Asia-Pacific Bioinformatics Conference . 2018

机译：CMFSM：常亮的子图挖掘，可扩展的CPU-MIC协调药物发现工具
5. Development and application of ligand-based and structure-based computational drug discovery tools based on frequent subgraph mining of chemical structures [D] . Khashan, Raed Saeed 2007

机译：基于化学结构频繁子图挖掘的基于配体和基于结构的计算药物发现工具的开发和应用
6. cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining [O] . Shunyun Yang, Runxin Guo, Rui Liu, 2018

机译：cmFSM：通过频繁的子图挖掘可扩展的CPU-MIC协作式毒品发现工具
7. The Gaston Tool for Frequent Subgraph Mining [O] . Nijssen Siegfried, Kok Joost N. 2005

机译：加斯东频繁子图挖掘工具
8. GREWA Scalable Frequent Subgraph Discovery Algorithm. [R] . Kuramochi, M., Karypis, G. 2004

机译：GREWa可扩展频繁子图发现算法。

cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining

摘要

著录项

相似文献

相关主题

期刊订阅