首页> 外文期刊>Future generation computer systems >A programming model for Hybrid Workflows: Combining task-based workflows and dataflows all-in-one
【24h】

A programming model for Hybrid Workflows: Combining task-based workflows and dataflows all-in-one

机译:混合工作流程编程模型:将基于任务的工作流和数据流组合到一体化

获取原文
获取原文并翻译 | 示例
           

摘要

In the past years, e-Science applications have evolved from large-scale simulations executed in a single cluster to more complex workflows where these simulations are combined with High-Performance Data Analytics (HPDA). To implement these workflows, developers are currently using different patterns; mainly task-based and dataflow. However, since these patterns are usually managed by separated frameworks, the implementation of these applications requires to combine them; considerably increasing the effort for learning, deploying, and integrating applications in the different frameworks. This paper tries to reduce this effort by proposing a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end. During the evaluation, we introduce four use cases to illustrate the new capabilities of Hybrid Workflows; measuring the performance benefits when processing data continuously as it is generated, when removing synchronisation points, when processing external real-time data, and when combining task-based workflows and dataflows at different levels. The users identifying these patterns in their workflows may use the presented uses cases (and their performance improvements) as a reference to update their code and benefit of the capabilities of Hybrid Workflows. Furthermore, we analyse the scalability in terms of the number of writers and readers and measure the task analysis, task scheduling, and task execution times when using objects or streams.
机译:在过去几年中,E-Science应用程序已经从一个集群中执行的大规模模拟演变为更复杂的工作流程,这些模拟与高性能数据分析(HPDA)组合。要实现这些工作流程,开发人员目前正在使用不同的模式;主要是任务和数据流。但是,由于这些模式通常由分离的框架管理,因此这些应用程序的实现需要组合它们;大大提高了在不同框架中学习,部署和集成应用程序的努力。本文试图通过提出扩展基于任务的管理系统来支持连续输入和输出数据来减少这一努力,以使任务基工作流和数据流(从现在开始的混合工作流程)的组合使用单个编程模型。因此,开发人员可以根据要求构建具有不同方法的复杂数据科学工作流程。为了说明混合工作流的功能,我们建立了一个分布式流库和完全函数的原型扩展了Compss,成熟,通用,基于任务,并行编程模型。可以轻松地将库与现有的基于任务的框架集成,以提供对数据流的支持。此外,它在Java和Python中提供了对象和文件流的同质,通用和简单表示;启用复杂的工作流以处理任何数据类型,而无需直接处理流反向端。在评估期间,我们介绍了四种用例来说明混合工作流的新功能;当在生成时,在处理外部实时数据时,在生成时,在生成数据时,在生成数据时,可以测量性能优势,以及在不同级别组合基于任务的工作流程和数据流时,当删除同步点时。在其工作流程中识别这些模式的用户可以使用所呈现的使用情况(及其性能改进)作为更新其代码和混合工作流的功能的引用。此外,我们在编写器和读取器的数量方面分析可扩展性,并在使用对象或流时测量任务分析,任务调度和任务执行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号