首页> 外文期刊>Future generation computer systems >ClowdFlows: Online workflows for distributed big data mining
【24h】

ClowdFlows: Online workflows for distributed big data mining

机译:ClowdFlows:用于分布式大数据挖掘的在线工作流

获取原文
获取原文并翻译 | 示例
           

摘要

The paper presents a platform for distributed computing, developed using the latest software technologies and computing paradigms to enable big data mining. The platform, called ClowdFlows, is implemented as a cloud-based web application with a graphical user interface which supports the construction and execution of data mining workflows, including web services used as workflow components. As a web application, the ClowdFlows platform poses no software requirements and can be used from any modern browser, including mobile devices. The constructed workflows can be declared either as private or public, which enables sharing the developed solutions, data and results on the web and in scientific publications. The server-side software of ClowdFlows can be multiplied and distributed to any number of computing nodes. From a developer's perspective the platform is easy to extend and supports distributed development with packages. The paper focuses on big data processing in the batch and real-time processing mode. Big data analytics is provided through several algorithms, including novel ensemble techniques, implemented using the map-reduce paradigm and a special stream mining module for continuous parallel workflow execution. The batch mode and real-time processing mode are demonstrated with practical use cases. Performance analysis shows the benefit of using all available data for learning in distributed mode compared to using only subsets of data in non-distributed mode. The ability of ClowdFlows to handle big data sets and its nearly perfect linear speedup is demonstrated.
机译:本文介绍了一个分布式计算平台,该平台使用最新的软件技术和计算范例开发,以实现大数据挖掘。该平台称为ClowdFlows,是作为具有图形用户界面的基于云的Web应用程序实现的,该图形用户界面支持数据挖掘工作流程的构建和执行,包括用作工作流程组件的Web服务。作为Web应用程序,ClowdFlows平台不构成任何软件要求,可以在任何现代浏览器(包括移动设备)中使用。可以将已构造的工作流程声明为私有或公开,从而可以在Web上和科学出版物中共享开发的解决方案,数据和结果。 ClowdFlows的服务器端软件可以被复制并分发到任意数量的计算节点。从开发人员的角度来看,该平台易于扩展,并通过软件包支持分布式开发。本文着重于批处理和实时处理模式下的大数据处理。大数据分析是通过几种算法提供的,其中包括新颖的集成技术,这些技术使用map-reduce范式和特殊的流挖掘模块实现,以连续并行执行工作流。批处理模式和实时处理模式通过实际用例进行了演示。与仅在非分布式模式下使用数据的子集相比,性能分析显示了在分布式模式下使用所有可用数据进行学习的好处。证明了ClowdFlows处理大数据集的能力及其近乎完美的线性加速。

著录项

  • 来源
    《Future generation computer systems》 |2017年第3期|38-58|共21页
  • 作者单位

    Joief Stefan Institute, Ljubljana, Slovenia,]ozef Stefan International Postgraduate School, Ljubljana, Slovenia;

    Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia;

    Joief Stefan Institute, Ljubljana, Slovenia;

    Joief Stefan Institute, Ljubljana, Slovenia,]ozef Stefan International Postgraduate School, Ljubljana, Slovenia,University of Nova Gorica, Nova Gorica, Slovenia;

    Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Data mining platform; Cloud computing; Scientific workflows; Batch processing; Map-reduce; Big data;

    机译:数据挖掘平台;云计算;科学的工作流程;批量处理;映射减少大数据;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号