首页> 外文会议>ICA3PP 2014 >C2CU : A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm

【24h】

C2CU : A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm

机译：C2CU：用于批量执行顺序算法的CUDA C程序生成器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A sequential algorithm is oblivious if an address accessed at each time does not depend on input data. Many important tasks including matrix computation, signal processing, sorting, dynamic programming, and encryption/decryption can be performed by oblivious sequential algorithms. Bulk execution of a sequential algorithm is to execute it for many independent inputs in turn or in parallel. The main contribution of this paper is to develop a tool that generates a CUDA C program for the bulk execution of an oblivious sequential algorithm. More specifically, our tool automatically converts a C language program describing an oblivious sequential algorithm into a CUDA C program that performs the bulk execution of the C language program. Generated C programs can be executed in CUDA-enabled GPUs. We have implemented CUDA C programs for the bulk execution of bitonic sorting algorithm, Floyd-Warshall algorithm, and Montgomery modulo multiplication. Our implementations running on GeForce GTX Titan for the bulk execution can be 199 times faster for bitonic sort, 54 times faster for Floyd-Warshall algorithm, and 78 times faster for Montgomery modulo multiplication, over the implementations on a single Intel Xeon CPU.

机译：如果在每次访问的地址不依赖于输入数据，则序列算法令人沮丧。包括矩阵计算，信号处理，排序，动态编程和加密/解密的许多重要任务可以由令人沮丧的顺序算法执行。批量执行顺序算法是依次或并行地执行许多独立输入。本文的主要贡献是开发一种工具，该工具为批量执行不希望的顺序算法而生成CUDA C程序。更具体地，我们的工具会自动将描述令人沮丧的连续算法描述为CUDA C程序的C语言程序，该程序执行C语言程序的批量执行。生成的C程序可以在支持CUDA的GPU中执行。我们已经实施了CUDA C计划，以便对BITONIC分类算法，FLOYD-WARSHALL算法和MONTGOMERY MODULO乘法进行批量执行。我们在BiForce GTX Titan上运行的实现可以为Bitonic Sort的速度更快，对于Floyd-Warshall算法速度快54倍，对于蒙哥马利Xeon CPU的实现，蒙哥马利Modulo乘法速度快78倍。

著录项

来源
《ICA3PP 2014》|2014年||共14页
会议地点
作者
Daisuke Takafuji; Koji Nakano; Yasuaki Ito;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP302.1-532;
关键词
GPGPU; CUDA; bulk execution; oblivious algorithms; Floyd-Warshall algorithm; Montgomery modulo multiplication;

机译：GPGPU;CUDA;批量执行;忘记算法;弗洛伊德 - 战争算法;蒙哥马利模数乘法;

相似文献

外文文献
中文文献
专利

1. C2CU: A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm [J] . Daisuke TAKAFUJI, Koji NAKANO, Yasuaki ITO 電子情報通信学会技術研究報告. コンピュ-タシステム. Computer Systems . 2014,第302期

机译：C2CU：CUDA C程序生成器，用于批量执行顺序算法
2. C2CU: a CUDA C program generator for bulk execution of a sequentialrnalgorithm [J] . Daisuke Takafuji, Koji Nakano, Yasuaki Ito, Concurrency and Computation . 2017,第17期

机译：C2CU：用于批量执行顺序算法的CUDA C程序生成器
3. C2CU: a CUDA C program generator for bulk execution of a sequentialalgorithm [J] . Daisuke Takafuji, Koji Nakano, Yasuaki Ito, Concurrency and Computation . 2017,第17期

机译：C2CU：用于批量执行顺序的CUDA C程序生成器算法
4. C2CU: A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm [C] . Daisuke Takafuji, Koji Nakano, Yasuaki Ito International conference on algorithms and architectures for parallel processing . 2014

机译：C2CU：CUDA C程序生成器，用于批量执行顺序算法
5. A Look at the application and effectiveness of CUDA programming applied to precedence-constrained TSP using a genetic algorithm meta-heuristic. [D] . Wagner, Ross. 2013

机译：看一下使用遗传算法元启发式算法将CUDA编程应用于优先约束TSP的应用和有效性。
6. Programming and execution of sequential movements in Parkinsons disease. [O] . R D Rafal, A W Inhoff, J H Friedman, 1987

机译：帕金森氏病序贯运动的编程和执行。
7. Bulk execution of Euclidean algorithms on the CUDA-enabled GPU [O] . Toru Fujita, Koji Nakano, Yasuaki Ito 2016

机译：支持CUDA的GPU上的欧几里德算法的批量执行

C2CU : A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅