Proctor: Detecting and Investigating Interference in Shared Datacenters

机译：Proctor：检测和调查共享数据中心中的干扰

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cloud-scale datacenter management systems utilize virtualization to provide performance isolation while maximizing the utilization of the underlying hardware infrastructure. However, virtualization does not provide complete performance isolation as Virtual Machines (VMs) still compete for nonreservable shared resources (like caches, network, I/O bandwidth etc.) This becomes highly challenging to address in datacenter environments housing tens of thousands of VMs, causing degradation in application performance. Addressing this problem for production datacenters requires a non-intrusive scalable solution that 1) detects performance intrusion and 2) investigates both the intrusive VMs causing interference, as well as the resource(s) for which the VMs are competing for. To address this problem, this paper introduces Proctor, a real time, lightweight and scalable analytics fabric that detects performance intrusive VMs and identifies its root causes from among the arbitrary VMs running in shared datacenters across 4 key hardware resources - network, I/O, cache, and CPU. Proctor is based on a robust statistical approach that requires no special profiling phases, standing in stark contrast to a wide body of prior work that assumes pre-acquisition of application level information prior to its execution. By detecting performance degradation and identifying the root cause VMs and their metrics, Proctor can be utilized to dramatically improve the performance outcomes of applications executing in large-scale datacenters. From our experiments, we are able to show that when we deploy Proctor in a datacenter housing a mix of I/O, network, compute and cache-sensitive applications, it is able to effectively pinpoint performance intrusive VMs. Further, we observe that when Proctor is applied with migration, the application-level Quality-of-Service improves by an average of 2.2× as compared to systems which are unable to detect, identify and pinpoint performance intrusion and their root causes.

机译：云规模的数据中心管理系统利用虚拟化来提供性能隔离，同时最大限度地利用基础硬件基础架构。但是，由于虚拟机（VM）仍在争夺不可保留的共享资源（例如缓存，网络，I / O带宽等），因此虚拟化无法提供完全的性能隔离。要在容纳成千上万个VM的数据中心环境中解决该问题变得非常具有挑战性，导致应用程序性能下降。为生产数据中心解决此问题，需要一种非侵入式可伸缩解决方案，该解决方案包括：1）检测性能入侵，以及2）研究引起干扰的侵入式VM以及VM争用的资源。为了解决这个问题，本文介绍了Proctor，这是一种实时，轻量级和可扩展的分析结构，可检测性能侵入性VM并从4种关键硬件资源（网络，I / O，缓存和CPU。 Proctor基于可靠的统计方法，不需要特殊的分析阶段，这与大量先前的工作形成了鲜明的对比，之前的工作假定在执行之前预先获取了应用程序级别的信息。通过检测性能下降并确定VM的根本原因及其指标，可以利用Proctor显着改善大型数据中心中执行的应用程序的性能结果。从我们的实验中，我们可以证明，当在包含I / O，网络，计算和缓存敏感应用程序的数据中心中部署Proctor时，它可以有效地查明性能侵入型VM。此外，我们观察到，当将Proctor应用于迁移时，与无法检测，识别和查明性能入侵及其根本原因的系统相比，应用程序级别的服务质量平均提高了2.2倍。

著录项

来源
《International Symposium on Performance Analysis of Systems and Software》|2018年|76-86|共11页
会议地点
作者
Ram Srivatsa Kannan; Animesh Jain; Michael A. Laurenzano; Lingjia Tang; Jason Mars;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Quality of service; Degradation; Task analysis; Interference; Measurement; Runtime; Hardware;

机译：服务质量;降级;任务分析;干扰;测量;运行时;硬件;

相似文献

外文文献
中文文献
专利

1. Availability Aware VNF Deployment in Datacenter Through Shared Redundancy and Multi-Tenancy [J] . IEEE transactions on network and service management . 2019,第4期

机译：通过共享冗余和多租户在数据中心中部署可用性感知VNF部署
2. Distributed sub-light-tree based multicast provisioning with shared protection in elastic optical datacenter networks [J] . Tao Gao, Weixia Zou, Xin Li, Optical Switching and Networking . 2019,第JANa期

机译：弹性光数据中心网络中具有共享保护的基于分布式子树的多播配置
3. Exploiting Shared-Access Passive Optical Networks for Building Distributed Datacenters [J] . MATEC Web of Conferences . 2016,第2016期

机译：利用用于构建分布式数据中心的共享访问被动光网络
4. Proctor: Detecting and Investigating Interference in Shared Datacenters [C] . Ram Srivatsa Kannan, Animesh Jain, Michael A. Laurenzano, IEEE International Symposium on Performance Analysis of Systems and Software . 2018

机译：Proctor：检测和调查共享数据中心的干扰
5. Microeconomic Models for Managing Shared Datacenters. [D] . Llull, Qiuyun. 2017

机译：用于管理共享数据中心的微观经济模型。
6. Accounting for Shared and Unshared Dosimetric Uncertainties in the Dose Response for Ultrasound-Detected Thyroid Nodules after Exposure to Radioactive Fallout [O] . Charles E. Land, Deukwoo Kwon, F. Owen Hoffman, -1

机译：在暴露于放射性尘埃后超声检测甲状腺结节的剂量反应中考虑共用和非共用剂量学的不确定性
7. Addressing shared resource contention in datacenter servers [O] . Blagodurov Sergey 2013

机译：解决数据中心服务器中的共享资源争用

Proctor: Detecting and Investigating Interference in Shared Datacenters

摘要

著录项

相似文献

相关主题

期刊订阅