...
首页> 外文期刊>International Journal of High Performance Computing and Networking >Implementation of a deduplication cache mechanism using content-defined chunking
【24h】

Implementation of a deduplication cache mechanism using content-defined chunking

机译:使用内容定义的分块实现重复数据删除缓存机制

获取原文
获取原文并翻译 | 示例
           

摘要

Many application programs in data-intensive science read and write large files. Large data consume significant memory because the data is loaded into the page cache. Since memory resources are critically valuable in data-intensive computing, reducing the memory footprint consumed by file data is essential. In this paper, we propose a cache deduplication mechanism with content-defined chunking (CDC) for the Gfarm distributed file system. CDC divides a file into variable-size blocks (chunks) based on the contents of the file. The client stores the chunks in the local file system as cache files and reuses them during subsequent file accesses. Deduplication of chunks reduces the amount of transmitted data between clients and servers, and reduces storage and memory requirements. The experimental results demonstrate that the proposed mechanism significantly improves the performance of file-read operations and that the introduction of parallelism reduces the overhead of file-write operations.
机译:数据密集型科学中的许多应用程序都读取和写入大文件。大数据会占用大量内存,因为数据已加载到页面缓存中。由于内存资源在数据密集型计算中至关重要,因此减少文件数据消耗的内存占用空间至关重要。在本文中,我们为Gfarm分布式文件系统提出了一种具有内容定义分块(CDC)的缓存重复数据删除机制。 CDC根据文件内容将文件划分为可变大小的块(块)。客户端将块存储在本地文件系统中作为缓存文件,并在后续文件访问期间重用它们。块的重复数据删除减少了客户端和服务器之间传输的数据量,并减少了存储和内存需求。实验结果表明,该机制显着提高了文件读取操作的性能,并且并行性的引入减少了文件写入操作的开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号