首页> 外文OA文献 >The 7U Evaluation Method: Evaluating Software Systems via Runtime Fault-Injection and Reliability, Availability and Serviceability (RAS) Metrics and Models
【2h】

The 7U Evaluation Method: Evaluating Software Systems via Runtime Fault-Injection and Reliability, Availability and Serviceability (RAS) Metrics and Models

机译:7U评估方法:通过运行时故障注入和可靠性,可用性和可维护性(RAS)度量标准和模型评估软件系统

摘要

Renewed interest in developing computing systems that meet additional non-functional requirements such as reliability, high availability and ease-of-management/self-management (serviceability) has fueled research into developing systems that exhibit enhanced reliability, availability and serviceability (RAS) capabilities. This research focus on enhancing the RAS capabilities of computing systems impacts not only the legacy/existing systems we have today, but also has implications for the design and development of next generation (self- managing/self-*) systems, which are expected to meet these non-functional requirements with minimal human intervention. To reason about the RAS capabilities of the systems of today or the self-* systems of tomorrow, there are three evaluation-related challenges to address. First, developing (or identifying) practical fault-injection tools that can be used to study the failure behavior of computing systems and exercise any (remediation) mechanisms the system has available for mitigating or resolving problems. Second, identifying techniques that can be used to quantify RAS deficiencies in computing systems and reason about the efficacy of individual or combined RAS-enhancing mechanisms (at design-time or after system deployment). Third, developing an evaluation methodology that can be used to objectively compare systems based on the (expected or actual) benefits of RAS-enhancing mechanisms. This thesis addresses these three challenges by introducing the 7U Evaluation Methodology, a complementary approach to traditional performance-centric evaluations that identifies criteria for comparing and analyzing existing (or yet-to-be-added) RAS-enhancing mechanisms, is able to evaluate and reason about combinations of mechanisms, exposes under-performing mechanisms and highlights the lack of mechanisms in a rigorous, objective and quantitative manner. The development of the 7U Evaluation Methodology is based on the following three hypotheses. First, that runtime adaptation provides a platform for implementing efficient and flexible fault-injection tools capable of in-situ and in-vivo interactions with computing systems. Second, that mathematical models such as Markov chains, Markov reward networks and Control theory models can successfully be used to create simple, reusable templates for describing specific failure scenarios and scoring the systemäó»s responses, i.e., studying the failure-behavior of systems, and the various facets of its remediation mechanisms and their impact on system operation. Third, that combining practical fault-injection tools with mathematical modeling techniques based on Markov Chains, Markov Reward Networks and Control Theory can be used to develop a benchmarking methodology for evaluating and comparing the reliability, availability and serviceability (RAS) characteristics of computing systems. This thesis demonstrates how the 7U Evaluation Method can be used to evaluate the RAS capabilities of real-world computing systems and in so doing makes three contributions. First, a suite of runtime fault-injection tools (Kheiron tools) able to work in a variety of execution environments is developed. Second, analytical tools that can be used to construct mathematical models (RAS models) to evaluate and quantify RAS capabilities using appropriate metrics are discussed. Finally, the results and insights gained from conducting fault-injection experiments on real-world systems and modeling the system responses (or lack thereof) using RAS models are presented. In conducting 7U Evaluations of real-world systems, this thesis highlights the similarities and differences between traditional performance-oriented evaluations and RAS-oriented evaluations and outlines a general framework for conducting RAS evaluations.
机译:对开发满足其他非功能性要求(例如可靠性,高可用性和易管理性/自我管理(可维护性))的计算系统的重新兴趣,推动了对具有增强的可靠性,可用性和可维护性(RAS)功能的开发系统的研究。这项研究的重点是增强计算系统的RAS功能,这不仅会影响我们今天拥有的旧系统/现有系统,而且还会对下一代(自我管理/ self *)系统的设计和开发产生影响。只需最少的人工干预即可满足这些非功能性要求。为了对当今系统或明天的self- *系统的RAS功能进行推理,需要解决三个与评估相关的挑战。首先,开发(或识别)实用的故障注入工具,这些工具可用于研究计算系统的故障行为,并行使系统可用于缓解或解决问题的任何(补救)机制。其次,确定可用于量化计算系统中RAS缺陷的技术以及(在设计时或在系统部署后)单个或组合RAS增强机制的有效性的原因。第三,开发一种评估方法,该方法可用于基于RAS增强机制的(预期或实际)收益客观地比较系统。本论文通过引入7U评估方法论解决了这三个挑战,这是对传统以性能为中心的评估的一种补充方法,它确定了用于比较和分析现有(或将要添加的)RAS增强机制的标准,能够评估和评估。机制组合的原因,暴露了绩效不佳的机制,并以严格,客观和定量的方式强调了缺乏机制。 7U评估方法的发展基于以下三个假设。首先,运行时自适应为实现高效,灵活的故障注入工具提供了平台,该工具能够与计算系统进行原位和体内交互。其次,诸如马尔可夫链,马尔可夫奖励网络和控制理论模型之类的数学模型可以成功地用于创建简单的可重复使用的模板,以描述特定的故障情况并评分系统的响应,即研究系统的故障行为,以及其修复机制的各个方面及其对系统运行的影响。第三,将实用的故障注入工具与基于马尔可夫链,马尔可夫奖励网络和控制理论的数学建模技术相结合,可以用来开发一种基准测试方法,用于评估和比较计算系统的可靠性,可用性和可维护性(RAS)特性。本文演示了如何将7U评估方法用于评估实际计算系统的RAS能力,从而做出了三点贡献。首先,开发了一套可以在各种执行环境中工作的运行时故障注入工具(Kheiron工具)。其次,讨论了可用于构建数学模型(RAS模型)以使用适当的指标评估和量化RAS功能的分析工具。最后,介绍了通过在实际系统上进行故障注入实验并使用RAS模型对系统响应(或缺乏响应)进行建模而获得的结果和见解。在进行实际系统的7U评估时,本文重点介绍了传统的面向性能的评估与RAS的评估之间的异同,并概述了进行RAS评估的一般框架。

著录项

  • 作者

    Griffith Rean;

  • 作者单位
  • 年度 2008
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号