首页> 外文会议>IEEE/ACM international symposium on cluster, cloud and grid computing >Lark: Bringing Network Awareness to High Throughput Computing
【24h】

Lark: Bringing Network Awareness to High Throughput Computing

机译:云雀:将网络意识带入高吞吐量计算

获取原文

摘要

High throughput computing (HTC) systems are widely adopted in scientific discovery and engineering research. They are responsible for scheduling submitted batch jobs to utilize the cluster resources. Current systems mostly focus on managing computing resources like CPU and memory, however, they lack flexible and fine-grained management mechanisms for network resources. This has increasingly been an urgent need as current batch systems may be distributed among dozens of sites around the globe like Open Science Grid. The Lark project was motivated by this need to re-examine how the HTC layer interacts with the network layer. In this paper, we present the system architecture of Lark and its implementation as a plugin of HTCondor which is a popular HTC software project. Lark achieves lightweight network virtualization at per-job granularity for HTCondor by utilizing Linux container and virtual Ethernet devices, this provides each batch job with a unique network address in a private network namespace. We extended HTCondor's description language, Class Ads, so users can specify networking requirements in the job submission script. HTCondor can perform matchmaking to make sure user-specified network requirements and resource-specific policies are fulfilled. We also extended the job agent, condor starter, so that it can manage and configure the job's network environment. Given this important building block as the core, we implement bandwidth management functionality at both the host and network levels utilizing software-defined networking (SDN). Our experiments and evaluations show that Lark can effectively manage network resources within the cluster with low overhead. It provides the users with better predictability of their job execution and the administrators more flexibility in network resource consumption policies.
机译:高通量计算(HTC)系统在科学发现和工程研究中被广泛采用。他们负责安排提交的批处理作业以利用群集资源。当前的系统主要专注于管理CPU和内存等计算资源,但是,它们缺乏灵活而细粒度的网络资源管理机制。由于当前的批处理系统可能分布在像Open Science Grid这样的全球数十个站点中,因此这已成为迫切的需求。 Lark项目的动机是需要重新检查HTC层与网络层的交互方式。在本文中,我们介绍了Lark的系统体系结构及其作为HTCondor插件的实现,HTCondor是一个流行的HTC软件项目。 Lark利用Linux容器和虚拟以太网设备为HTCondor实现了按作业粒度的轻量级网络虚拟化,这为每个批处理作业提供了专用网络名称空间中的唯一网络地址。我们扩展了HTCondor的描述语言类广告,因此用户可以在作业提交脚本中指定网络要求。 HTCondor可以进行配对,以确保满足用户指定的网络要求和特定于资源的策略。我们还扩展了作业代理,神鹰启动器,以便它可以管理和配置作业的网络环境。以这个重要的构建块为核心,我们利用软件定义网络(SDN)在主机和网络级别上实现带宽管理功能。我们的实验和评估表明,Lark可以以较低的开销有效地管理群集中的网络资源。它为用户提供了更好的工作执行可预测性,并且管理员在网络资源消耗策略方面拥有更大的灵活性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号