An Improved Memory Cache Management Study Based on Spark

Suzhen Wang; Yanpiao Zhang; Lu Zhang; Ning Cao; Chaoyi Pang

首页> 中文期刊> 《计算机、材料和连续体（英文）》 >An Improved Memory Cache Management Study Based on Spark

An Improved Memory Cache Management Study Based on Spark

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Spark is a fast unified analysis engine for big data and machine learning,in which the memory is a crucial resource.Resilient Distribution Datasets(RDDs)are parallel data structures that allow users explicitly persist intermediate results in memory or on disk,and each one can be divided into several partitions.During task execution,Spark automatically monitors cache usage on each node.And when there is a RDD that needs to be stored in the cache where the space is insufficient,the system would drop out old data partitions in a least recently used(LRU)fashion to release more space.However,there is no mechanism specifically for caching RDD in Spark,and the dependency of RDDs and the need for future stages are not been taken into consideration with LRU.In this paper,we propose the optimization approach for RDDs cache and LRU based on the features of partitions,which includes three parts:the prediction mechanism for persistence,the weight model by using the entropy method,and the update mechanism of weight and memory based on RDDs partition feature.Finally,through the verification on the spark platform,the experiment results show that our strategy can effectively reduce the time in performing and improve the memory usage.

著录项

来源
《计算机、材料和连续体（英文）》 |2018年第9期|P.415-431|共17页
作者
Suzhen Wang; Yanpiao Zhang; Lu Zhang; Ning Cao; Chaoyi Pang;
展开▼
作者单位

Hebei University of Economics and Business Shijiazhuang Hebei 050061 China;

Hebei University of Economics and Business Shijiazhuang Hebei 050061 China;

Hebei University of Economics and Business Shijiazhuang Hebei 050061 China;

University College Dublin Belfield Dublin 4 Ireland;

The Australian e-Health Research Centre ICT Centre CSIRO Australia;

展开▼
原文格式 PDF
正文语种 chi
中图分类计算技术、计算机技术;
关键词
Resilient; distribution; datasets; update; mechanism; weight; mode;

机译：弹性;分布;数据集;更新;机制;体重;模式;

An Improved Memory Cache Management Study Based on Spark

摘要

著录项

相关主题

期刊订阅