A Case Study of Accelerating Apache Spark with FPGA

机译：用FPGA加速Apache Spark的案例研究

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Apache Spark is an efficient distributed computing framework for big data processing. It supports in-memory computation of RDDs (Resilient Distributed Dataset) and provides a provision of reusability, fault tolerance, and real-time stream processing. However, the tasks in Spark framework are only performed on CPU. The low degree of parallelism and power inefficiency of CPU may restrict the performance and scalability of the cluster. In order to improve the performance and power dissipation of the data center, heterogeneous accelerators such as FPGA, GPU, MIC (Many Integrated Core) exhibit more efficient performance than the general-purpose processor in big data processing. In this work, we propose a framework to integrate FPGA accelerator into a Spark cluster. We use FPGA to accelerate the Spark tasks developed with Python, and in this way, the main computing load is performed on FPGA instead of CPU. We illustrate the performance of the FPGA based Spark framework with a case study of 2D-FFT algorithm acceleration. The results showed that FPGA based Spark implementation acquires 1.79x speedup than CPU implementation.

机译：Apache Spark是用于大数据处理的高效分布式计算框架。它支持RDD（弹性分布式数据集）的内存计算，并提供了可重用性，容错性和实时流处理功能。但是，Spark框架中的任务仅在CPU上执行。 CPU的低并行度和低功耗可能会限制群集的性能和可伸缩性。为了提高数据中心的性能和功耗，在大数据处理中，FPGA，GPU，MIC（许多集成内核）等异构加速器比通用处理器具有更高的性能。在这项工作中，我们提出了一个将FPGA加速器集成到Spark集群中的框架。我们使用FPGA来加速使用Python开发的Spark任务，这样，主要的计算负载就在FPGA而不是CPU上执行。我们以2D-FFT算法加速为例，说明了基于FPGA的Spark框架的性能。结果表明，基于FPGA的Spark实施比CPU实施快1.79倍。

著录项

来源
《2018 17th IEEE International Conference on Trust, Security and Privacy In Computing and Communications, 12th IEEE International Conference on Big Data Science and Engineering》|2018年|855-860|共6页
会议地点 New York(US)
作者
Junjie Hou; Yongxin Zhu; Linghe Kong; Zhe Wang; Sen Du; Shijin Song; Tian Huang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Field programmable gate arrays; Sparks; Python; Acceleration; Libraries; Big Data;

机译：现场可编程门阵列;火花; Python;加速;库;大数据;;

相似文献

外文文献
中文文献
专利

1. Accelerating Apache Spark with FPGAs [J] . Ehsan Ghasemi, Paul Chow Concurrency and computation: practice and experience . 2019,第2期

机译：使用FPGA加速Apache Spark
2. Optimizing and accelerating space-time Ripley's K function based on Apache Spark for distributed spatiotemporal point pattern analysis [J] . Future generation computer systems . 2020,第Apra期

机译：基于Apache Spark的时空Ripley K函数的优化和加速，用于分布式时空点模式分析
3. Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics [J] . Lunga Dalton, Gerrand Jonathan, Yang Lexie, Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of . 2020,第期

机译：Apache Spark加速了大规模卫星图像分析的深度学习推断
4. A Case Study of Accelerating Apache Spark with FPGA [C] . Junjie Hou, Yongxin Zhu, Linghe Kong, IEEE International Conference on Big Data Science and Engineering . 2018

机译：用FPGA加速Apache Spark的案例研究
5. A performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark's GraphX [D] . Langewisch, Ryan P. 2015

机译：在Apache Spark的GraphX中执行推入重贴标签最大流量算法的性能研究
6. SPARK-MSNA: Efficient algorithm on Apache Spark for aligning multiple similar DNA/RNA sequences with supervised learning [O] . V. Vineetha, C. L. Biji, Achuthsankar S. Nair -1

机译：SPARK-MSNA：Apache Spark上的高效算法可通过监督学习将多个相似的DNA / RNA序列比对
7. Toward FPGA-Based Semantic Caching for Accelerating Data Analysis with Spark and HDFS [O] . Marouan Maghzaoui, Laurent d’Orazio, Julien Lallet 2019

机译：朝着基于FPGA的语义缓存，用于通过火花和HDF加速数据分析

A Case Study of Accelerating Apache Spark with FPGA

摘要

著录项

相似文献

相关主题

期刊订阅