首页> 外文期刊>IEICE transactions on information and systems >ParaLite: A Parallel Database System for Data-Intensive Workflows
【24h】

ParaLite: A Parallel Database System for Data-Intensive Workflows

机译:ParaLite: A Parallel Database System for Data-Intensive Workflows

获取原文
获取原文并翻译 | 示例
           

摘要

To better support data-intensive workflows which are typically built out of various independently developed executables, this paper proposes extensions to parallel database systems called User-Defined eXecutables (UDX) and collective queries. UDX facilitates the description of workflows by enabling seamless integrations of external executables into SQL statements without any efforts to write programs confirming to strict specifications of databases. A collective query is an SQL query whose results are distributed to multiple clients and then processed by them in parallel, using arbitrary UDX. It provides efficient parallelization of executables through the data transfer optimization algorithms that distribute query results to multiple clients, taking both communication cost and computational loads into account. We implement this concept in a system called Para Lite, a parallel database system based on a popular lightweight database SQLite. Our experiments show that Para Lite has several times higher performance over Hive for typical SQL tasks and has 10x speedup compared to a commercial DBMS for executables. In addition, this paper studies a real-world text processing workflow and builds it on top of Para Lite, Hadoop, Hive and general files. Our experiences indicate that Para Lite outperforms other systems in both productivity and performance for the workflow.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号