Join operation has always been a topic of interest in scientific research that is commonly used in most applications. Given that a massive amount of information is generated daily, one of the problems and bottlenecks in Join operations is the execution time and the complexity of parallelization. Between all the various join types, the left outer join is the most common whereas little work has been done to optimize this operation. A common type of outer join is Left outer join between small and large tables, and the optimal execution of this operation can have a major impact on the overall performance of programs. In this paper, we present an optimal algorithm that performs left outer join on small-large tables in parallel. We will also discuss all the challenges of parallel join and explain how to implement the algorithm in detail. We perform several experiments in the cloud computing environment using the Spark framework. The results show that the proposed algorithm is scalable and has better performance than existing algorithms.
展开▼