首页> 外文期刊>Knowledge-Based Systems >Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark
【24h】

Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark

机译:使用Apache Spark在MapReduce下学习分布式离散贝叶斯网络分类器

获取原文
获取原文并翻译 | 示例
           

摘要

The challenge of scalability has always been a focus on Machine Learning research, where improved algorithms and new techniques are proposed in a constant basis to deal with more complex problems. With the advent of Big Data, this challenge has been intensified, in which new large scale datasets overwhelm the majority of available techniques. The community has resorted to Cloud Computing and distributed programming paradigms as the most immediate solution where Apache Spark has proven to be the most promising framework. In this paper we focus on the problem of supervised classification, exploring the family of the so called Bayesian Network Classifiers by studying their adaptability to the MapReduce and Apache Spark frameworks. We will analyse a range of algorithms and propose distributed versions of them. Our approach is based on a general framework for learning this probabilistic models from large scale and high dimensional data, the latter being a problem with less support in the literature. We also present an extensive experimental evaluation of our proposal over a wide set of problems and different elastic configurations of a computing cluster to show the full extent of the scalability properties of our framework. Additional material and the software to reproduce our experiments can be found on the supplementary website http://simd.albacete.org/supplements/distributed_bncs.html. (C) 2016 Elsevier B.V. All rights reserved.
机译:可伸缩性的挑战一直是机器学习研究的重点,在其中不断提出改进的算法和新技术来处理更复杂的问题。随着大数据的出现,这一挑战变得更加严峻,新的大规模数据集淹没了大多数可用技术。社区已将云计算和分布式编程范例作为最直接的解决方案,而Apache Spark已被证明是最有前途的框架。在本文中,我们关注于监督分类问题,通过研究它们对MapReduce和Apache Spark框架的适应性来探索所谓的贝叶斯网络分类器的族。我们将分析一系列算法,并提出它们的分布式版本。我们的方法基于用于从大规模和高维数据中学习这种概率模型的通用框架,后者是文献中支持较少的问题。我们还针对大量问题和计算集群的不同弹性配置对提案进行了广泛的实验评估,以显示框架可扩展性的全部范围。可以在补充网站http://simd.albacete.org/supplements/distributed_bncs.html上找到用于重现我们实验的其他材料和软件。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号