首页> 中文期刊> 《计算机、材料和连续体(英文)》 >An Improved Memory Cache Management Study Based on Spark

An Improved Memory Cache Management Study Based on Spark

         

摘要

Spark is a fast unified analysis engine for big data and machine learning,in which the memory is a crucial resource.Resilient Distribution Datasets(RDDs)are parallel data structures that allow users explicitly persist intermediate results in memory or on disk,and each one can be divided into several partitions.During task execution,Spark automatically monitors cache usage on each node.And when there is a RDD that needs to be stored in the cache where the space is insufficient,the system would drop out old data partitions in a least recently used(LRU)fashion to release more space.However,there is no mechanism specifically for caching RDD in Spark,and the dependency of RDDs and the need for future stages are not been taken into consideration with LRU.In this paper,we propose the optimization approach for RDDs cache and LRU based on the features of partitions,which includes three parts:the prediction mechanism for persistence,the weight model by using the entropy method,and the update mechanism of weight and memory based on RDDs partition feature.Finally,through the verification on the spark platform,the experiment results show that our strategy can effectively reduce the time in performing and improve the memory usage.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号