Scaling HDFS to More Than 1 Million Operations Per Second with HopsFS

机译：使用HopsFS将HDFS扩展到每秒超过100万次操作

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

HopsFS is an open-source, next generation distribution of the Apache Hadoop Distributed File System (HDFS) that replaces the main scalability bottleneck in HDFS, single node in-memory metadata service, with a no-shared state distributed system built on a NewSQL database. By removing the metadata bottleneck in Apache HDFS, HopsFS enables significantly larger cluster sizes, more than an order of magnitude higher throughput, and significantly lower client latencies for large clusters. In this paper, we detail the techniques and optimizations that enable HopsFS to surpass 1 million file system operations per second - at least 16 times higher throughput than HDFS. In particular, we discuss how we exploit recent high performance features from NewSQL databases, such as application defined partitioning, partition-pruned index scans, and distribution aware transactions. Together with more traditional techniques, such as batching and write-ahead caches, we show how many incremental optimizations have enabled a revolution in distributed hierarchical file system performance.

机译：HopsFS是Apache Hadoop分布式文件系统（HDFS）的开源下一代发行版，它取代了建立在NewSQL数据库上的无共享状态分布式系统，从而取代了HDFS中的主要可伸缩性瓶颈，单节点内存元数据服务。。通过消除Apache HDFS中的元数据瓶颈，HopsFS可以实现更大的集群大小，更高的吞吐量（一个数量级以上）和大大降低的大型集群客户端延迟。在本文中，我们详细介绍了使HopsFS每秒超过100万个文件系统操作的技术和优化-吞吐量至少是HDFS的16倍。特别是，我们讨论了如何利用NewSQL数据库的最新高性能功能，例如应用程序定义的分区，分区修剪的索引扫描和可识别分发的事务。与更传统的技术（例如批处理和预写高速缓存）一起，我们展示了多少增量优化使分布式分层文件系统性能发生了革命。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2017年|683-688|共6页
会议地点
作者
Mahmoud Ismail; Salman Niazi; Mikael Ronström; Seif Haridi; Jim Dowling;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Metadata; Distributed databases; Indexes; Systems operation; File systems; Throughput;

机译：元数据;分布式数据库;索引;系统操作;文件系统;吞吐量;

相似文献

外文文献
中文文献
专利

1. Elastic HDFS: interconnected distributed architecture for availability-scalability enhancement of large-scale cloud storages [J] . Maghsoudloo M., Khoshavi N. Journal of supercomputing . 2020,第1期

机译：弹性HDFS：互连的分布式架构，可增强大型云存储的可用性-可扩展性
2. HDFS file operation fingerprints for forensic investigations [J] . Khader Mariam, Hadi Ali, Al-Naymat Ghazi Digital investigation . 2018,第MARa期

机译：HDFS文件操作指纹用于法医调查
3. HDFS Write Operation Using Fully Connected Digraph DataNode Network Topology [J] . B. Purnachandra Rao, N. Nagamalleswara Rao International Journal of Applied Engineering Research . 2017,第16aPta4期

机译：HDFS使用完全连接的DataNode网络拓扑写入操作
4. Scaling HDFS to more than 1 million operations per second with HopsFS [C] . Mahmoud Ismail, Salman Niazi, Mikael Ronstr?m, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2017

机译：用啤酒花FOOPS将HDF扩展到每秒超过100万次操作
5. Ecotourism in Bocas del Toro, Panama: The perceived effects of macro-scale laws and programs on the socio-economic and environmental development of micro-scale ecotourism operations. [D] . Bedi, Carissa. 2011

机译：巴拿马Bocas del Toro的生态旅游：宏观法律和计划对微观生态旅游活动的社会经济和环境发展的感知影响。
6. h5vc: scalable nucleotide tallies with HDF5 [O] . Paul Theodor Pyl, Julian Gehring, Bernd Fischer, -1

机译：h5vc：具有HDF5的可扩展核苷酸计数
7. HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases [O] . Salman Niazi, Mahmoud Ismail, Seif Haridi, 2018

机译：HOPSFS：使用NewsQL数据库缩放分层文件系统元数据

Scaling HDFS to More Than 1 Million Operations Per Second with HopsFS

摘要

著录项

相似文献

相关主题

期刊订阅