Improved Performance of Hive Using Index-Based Operation on Big Data

机译：使用基于索引的大数据操作提高Hive的性能

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Indexing provides fast searching over big data. It is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done. With the help of indexing, queries usually results in much better performance. Hive is batch-oriented data warehouse software. With the help of hive we can perform query processing and data analysis task. Hive is popular because it supports a bulk of the SQL operations in relational database management systems. To improve performance of database systems join has been the focus of several query optimization techniques. As a result the aim of this work is in two folds: Firstly we implement index based join technique and integrated in Hive and secondly performance is estimated, after perform join operation. we run relevant test queries on datasets generated using the industry standard benchmark, TPC-H. Our results indicate significant performance gain over highly selective queries.

机译：索引提供了对大数据的快速搜索。它是一种数据结构技术，可以根据已在其中建立索引的某些属性从数据库文件中有效地检索记录。借助索引，查询通常可以带来更好的性能。 Hive是面向批处理的数据仓库软件。在hive的帮助下，我们可以执行查询处理和数据分析任务。 Hive之所以受欢迎，是因为它支持关系数据库管理系统中的大量SQL操作。为了提高数据库系统的性能，联接已成为几种查询优化技术的重点。结果，这项工作的目的有两个方面：首先，我们执行基于索引的连接技术并将其集成到Hive中;其次，在执行连接操作后，评估性能。我们对使用行业标准基准TPC-H生成的数据集进行相关的测试查询。我们的结果表明，与高度选择性的查询相比，性能显着提高。

著录项

来源
《International Conference on Intelligent Computing and Control Systems》|2018年|1974-1978|共5页
会议地点
作者
Akshay Kumar Suman; Manasi Gyanchandani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Time factors; Indexing; Big Data; Conferences; Control systems; Task analysis;

机译：时间因素;索引;大数据;会议;控制系统;任务分析;

相似文献

外文文献
中文文献
专利

1. Discovery of medical Big Data analytics: Improving the prediction of traumatic brain injury survival rates by data mining Patient Informatics Processing Software Hybrid Hadoop Hive [J] . James A. Rodger Informatics in Medicine Unlocked . 2015,第1期

机译：医学大数据分析的发现：通过数据挖掘提高对颅脑外伤存活率的预测患者信息处理软件Hybrid Hadoop Hive
2. Improved join operations using ORC in HIVE [J] . Ruchika Kumar, Naresh Kumar CSI Transactions on ICT . 2016,第2a4期

机译：在HIVE中使用ORC改进了联接操作
3. A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE) [J] . Amirhossein Shamsaddini, Daniel J. Crichton, Krista Smith, Database . 2014,第1期

机译：使用高性能集成虚拟环境（HIVE）从现有数据库，出版物和NGS数据整理与癌症有关的变异的框架
4. Improved Performance of Hive Using Index-Based Operation on Big Data [C] . Akshay Kumar Suman, Manasi Gyanchandani International Conference on Intelligent Computing and Control Systems . 2018

机译：利用基于索引的大数据运行，提高了蜂巢的性能
5. Improving database performances in a changing environment with uncertain and dynamic information demand: An intelligent database system approach. [D] . Chen, Andrew Nai-Kuang. 1999

机译：在不确定的动态信息需求下，不断变化的环境中提高数据库性能：一种智能数据库系统方法。
6. A framework for organizing cancer-related variations from existing databases publications and NGS data using a High-performance Integrated Virtual Environment (HIVE) [O] . Tsung-Jung Wu, Amirhossein Shamsaddini, Yang Pan, 2014

机译：使用高性能集成虚拟环境（HIVE）从现有数据库出版物和NGS数据整理与癌症有关的变异的框架
7. Index-based Join Operations in Hive [O] . Mofidpoor Mahsa 2013

机译：Hive中基于索引的联接操作

Improved Performance of Hive Using Index-Based Operation on Big Data

摘要

著录项

相似文献

相关主题

期刊订阅