Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

Han Liangxiu; Ong Hwee Yong

首页> 外文期刊>Cluster computing >Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

【24h】

Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

机译：使用MapReduce的并行数据密集型应用程序：生物医学科学中的数据挖掘案例研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Performance is an open issue in data intensive applications (e.g. data mining tasks). Parallel and distributed computing systems (e.g. multicore computing, grid computing, cloud computing,etc.), along with hybrid programming models (e.g. MapReduce, MPI, etc.), is seen a sought-after solution for accelerating data-intensive applications. One of main challenges is how to exploit these advanced technologies effectively in facilitating fundamental science discoveries such as those in Biomedical Sciences. This paper explores how MapReduce and Cloud computing can accelerate performance of data intensive applications through a real data mining use case in the Biomedical Sciences. We have first adapted the data mining task using MapReduce model and then deployed it onto the Cloud. We have built an analytic model based on the MapReduce computations to evaluate the efficiency and performance of the prototype. The results, from both experiments and the evaluation model, show the performance and scalability can be enhanced through these advanced technologies.

机译：在数据密集型应用程序（例如数据挖掘任务）中，性能是一个未解决的问题。并行和分布式计算系统（例如多核计算，网格计算，云计算等）以及混合编程模型（例如MapReduce，MPI等）被视为加速数据密集型应用程序的抢手解决方案。主要挑战之一是如何有效利用这些先进技术来促进基础科学发现，例如生物医学领域的发现。本文探讨了MapReduce和云计算如何通过生物医学科学中的实际数据挖掘用例来加速数据密集型应用程序的性能。我们首先使用MapReduce模型调整了数据挖掘任务，然后将其部署到了Cloud。我们已经基于MapReduce计算建立了一个分析模型，以评估原型的效率和性能。来自实验和评估模型的结果表明，通过这些先进技术可以提高性能和可伸缩性。

著录项

来源
《Cluster computing》 |2015年第1期|共16页
作者
Han Liangxiu; Ong Hwee Yong;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类分子生物学;
关键词
Data-intensive computing; Parallel processing; MapReduce; Cloud computing; Data mining application in biomedical science;

机译：数据密集型计算;并行处理;MapReduce;云计算;数据挖掘在生物医学中的应用;

相似文献

外文文献
中文文献
专利

1. Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences [J] . Han Liangxiu, Ong Hwee Yong Cluster computing . 2015,第1期

机译：使用MapReduce的并行数据密集型应用程序：生物医学科学中的数据挖掘案例研究
2. High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework [J] . Guru Prasad M S, Nagesh H R, Swathi Prabhu International Journal of Intelligent Systems and Applications . 2017,第1期

机译：大数据的高性能计算：基于Hadoop MapReduce框架的事务数据并行频繁项集挖掘算法的性能优化方法
3. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications [J] . Yiyan Zhang, Yi Xin, Qin Li, BioMedical Engineering OnLine . 2017,第1期

机译：七种数据挖掘算法对生物医学分类应用数据集不同特征的实证研究
4. Accelerating Biomedical Data-Intensive Applications Using MapReduce [C] . Han Liangxiu, Ong Hwee Yong The 13th ACM/IEEE International Conference on Grid Computing. . 2012

机译：使用MapReduce加速生物医学数据密集型应用程序
5. Scalable parallel computing on clouds: Efficient and scalable architectures to perform pleasingly parallel, MapReduce and iterative data intensive computations on cloud environments. [D] . Gunarathne, Thilina. 2014

机译：云上的可伸缩并行计算：高效且可伸缩的架构，可在云环境上执行令人满意的并行，MapReduce和迭代式数据密集型计算。
6. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications [O] . Yiyan Zhang, Yi Xin, Qin Li, 2017

机译：七种数据挖掘算法在生物医学分类应用中不同数据集特征的实证研究
7. A paralleled big data algorithm with mapreduce framework for mining twitter data [O] . Bing L, Chan KCC 2015

机译：带有mapreduce框架的并行大数据算法，用于挖掘Twitter数据

Parallel data intensive applications using MapReduce: a data mining case study in biomedical sciences

摘要

著录项

相似文献

相关主题

期刊订阅