首页> 外文会议>International Conference on Intelligent Computing and Control Systems >Improved Performance of Hive Using Index-Based Operation on Big Data
【24h】

Improved Performance of Hive Using Index-Based Operation on Big Data

机译:使用基于索引的大数据操作提高Hive的性能

获取原文

摘要

Indexing provides fast searching over big data. It is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done. With the help of indexing, queries usually results in much better performance. Hive is batch-oriented data warehouse software. With the help of hive we can perform query processing and data analysis task. Hive is popular because it supports a bulk of the SQL operations in relational database management systems. To improve performance of database systems join has been the focus of several query optimization techniques. As a result the aim of this work is in two folds: Firstly we implement index based join technique and integrated in Hive and secondly performance is estimated, after perform join operation. we run relevant test queries on datasets generated using the industry standard benchmark, TPC-H. Our results indicate significant performance gain over highly selective queries.
机译:索引提供了对大数据的快速搜索。它是一种数据结构技术,可以根据已在其中建立索引的某些属性从数据库文件中有效地检索记录。借助索引,查询通常可以带来更好的性能。 Hive是面向批处理的数据仓库软件。在hive的帮助下,我们可以执行查询处理和数据分析任务。 Hive之所以受欢迎,是因为它支持关系数据库管理系统中的大量SQL操作。为了提高数据库系统的性能,联接已成为几种查询优化技术的重点。结果,这项工作的目的有两个方面:首先,我们执行基于索引的连接技术并将其集成到Hive中;其次,在执行连接操作后,评估性能。我们对使用行业标准基准TPC-H生成的数据集进行相关的测试查询。我们的结果表明,与高度选择性的查询相比,性能显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号