ParaLite: A Parallel Database System for Data-Intensive Workflows

Chen Ting; Taura Kenjiro

首页> 外文期刊>IEICE transactions on information and systems >ParaLite: A Parallel Database System for Data-Intensive Workflows

【24h】

ParaLite: A Parallel Database System for Data-Intensive Workflows

机译：ParaLite: A Parallel Database System for Data-Intensive Workflows

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

To better support data-intensive workflows which are typically built out of various independently developed executables, this paper proposes extensions to parallel database systems called User-Defined eXecutables (UDX) and collective queries. UDX facilitates the description of workflows by enabling seamless integrations of external executables into SQL statements without any efforts to write programs confirming to strict specifications of databases. A collective query is an SQL query whose results are distributed to multiple clients and then processed by them in parallel, using arbitrary UDX. It provides efficient parallelization of executables through the data transfer optimization algorithms that distribute query results to multiple clients, taking both communication cost and computational loads into account. We implement this concept in a system called Para Lite, a parallel database system based on a popular lightweight database SQLite. Our experiments show that Para Lite has several times higher performance over Hive for typical SQL tasks and has 10x speedup compared to a commercial DBMS for executables. In addition, this paper studies a real-world text processing workflow and builds it on top of Para Lite, Hadoop, Hive and general files. Our experiences indicate that Para Lite outperforms other systems in both productivity and performance for the workflow.

著录项

来源
《IEICE transactions on information and systems》 |2014年第5期|1211-1224|共14页
作者
Chen Ting; Taura Kenjiro;
展开▼
作者单位

Univ Tokyo, Tokyo 1130033, Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种英语
中图分类通信;
关键词
data-intensive workflow; parallel database system; user-defined executable; collective query;

ParaLite: A Parallel Database System for Data-Intensive Workflows

摘要

著录项

相关主题

期刊订阅