Lark: Bringing Network Awareness to High Throughput Computing

机译：云雀：将网络意识带入高吞吐量计算

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

High throughput computing (HTC) systems are widely adopted in scientific discovery and engineering research. They are responsible for scheduling submitted batch jobs to utilize the cluster resources. Current systems mostly focus on managing computing resources like CPU and memory, however, they lack flexible and fine-grained management mechanisms for network resources. This has increasingly been an urgent need as current batch systems may be distributed among dozens of sites around the globe like Open Science Grid. The Lark project was motivated by this need to re-examine how the HTC layer interacts with the network layer. In this paper, we present the system architecture of Lark and its implementation as a plugin of HTCondor which is a popular HTC software project. Lark achieves lightweight network virtualization at per-job granularity for HTCondor by utilizing Linux container and virtual Ethernet devices, this provides each batch job with a unique network address in a private network namespace. We extended HTCondor's description language, Class Ads, so users can specify networking requirements in the job submission script. HTCondor can perform matchmaking to make sure user-specified network requirements and resource-specific policies are fulfilled. We also extended the job agent, condor starter, so that it can manage and configure the job's network environment. Given this important building block as the core, we implement bandwidth management functionality at both the host and network levels utilizing software-defined networking (SDN). Our experiments and evaluations show that Lark can effectively manage network resources within the cluster with low overhead. It provides the users with better predictability of their job execution and the administrators more flexibility in network resource consumption policies.

机译：高通量计算（HTC）系统在科学发现和工程研究中被广泛采用。他们负责安排提交的批处理作业以利用群集资源。当前的系统主要专注于管理CPU和内存等计算资源，但是，它们缺乏灵活而细粒度的网络资源管理机制。由于当前的批处理系统可能分布在像Open Science Grid这样的全球数十个站点中，因此这已成为迫切的需求。 Lark项目的动机是需要重新检查HTC层与网络层的交互方式。在本文中，我们介绍了Lark的系统体系结构及其作为HTCondor插件的实现，HTCondor是一个流行的HTC软件项目。 Lark利用Linux容器和虚拟以太网设备为HTCondor实现了按作业粒度的轻量级网络虚拟化，这为每个批处理作业提供了专用网络名称空间中的唯一网络地址。我们扩展了HTCondor的描述语言类广告，因此用户可以在作业提交脚本中指定网络要求。 HTCondor可以进行配对，以确保满足用户指定的网络要求和特定于资源的策略。我们还扩展了作业代理，神鹰启动器，以便它可以管理和配置作业的网络环境。以这个重要的构建块为核心，我们利用软件定义网络（SDN）在主机和网络级别上实现带宽管理功能。我们的实验和评估表明，Lark可以以较低的开销有效地管理群集中的网络资源。它为用户提供了更好的工作执行可预测性，并且管理员在网络资源消耗策略方面拥有更大的灵活性。

著录项

来源
《IEEE/ACM international symposium on cluster, cloud and grid computing》|2015年|382-391|共10页
会议地点
作者
Zhe Zhang; Bockelman Brian; Carder Dale W.; Tannenbaum Todd;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
HTCondor; bandwidth management; high throughput computing; network-aware scheduling; software-defined networking;

机译：HTCondor;带宽管理;高吞吐量计算;网络感知调度;软件定义的网络;

相似文献

外文文献
中文文献
专利

1. Lark: An effective approach for software-defined networking in throughput computing clusters [J] . Zhe Zhang, Brian Bockelman, Dale W. Carder, Future generation computer systems . 2017,第JULa期

机译：Lark：一种用于吞吐量计算集群中软件定义网络的有效方法
2. Competition-based failure-aware scheduling for High-Throughput Computing systems on peer-to-peer networks [J] . Perez-Miguel Carlos, Mendiburu Alexander, Miguel-Alonso Jose Cluster computing . 2015,第3期

机译：对等网络上基于竞争的高吞吐量计算系统的故障感知调度
3. Throughput-Aware and Interference-Aware RRM Techniques for OFDMA-Based Networks [J] . Furaih Alshaalan, Saleh Alshebeili, Abdulkareem Adinoyi Arabian Journal for Science and Engineering . 2013,第11期

机译：基于OFDMA的网络的吞吐量感知和干扰感知RRM技术
4. Lark: Bringing Network Awareness to High Throughput Computing [C] . Zhe Zhang, Bockelman Brian, Carder Dale W., IEEE/ACM international symposium on cluster, cloud and grid computing . 2015

机译：LAKA：为高吞吐量计算提供网络意识
5. QoS-aware fine-grained power management in networked computing systems. [D] . Gong, Jiayu. 2011

机译：联网计算系统中支持QoS的细粒度电源管理。
6. Emergency Communications Based on Throughput-Aware D2D Multicasting in 5G Public Safety Networks [O] . Mengjun Yin, Wenjing Li, Lei Feng, 2020

机译：5G公共安全网络中基于吞吐量感知D2D组播的应急通信
7. Bringing Energy Aware Routing closer to Reality with SDN Hybrid Networks [O] . Huin, Nicolas, Rifai, Myriana, Giroire, Frédéric, 2017

机译：SDN混合网络使能源感知路由更接近于现实

Lark: Bringing Network Awareness to High Throughput Computing

摘要

著录项

相似文献

相关主题

期刊订阅