...
首页> 外文期刊>Dependable and Secure Computing, IEEE Transactions on >PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures
【24h】

PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures

机译:PLR:一种用于多核架构的瞬时容错的软件方法

获取原文
获取原文并翻译 | 示例
           

摘要

Transient faults are emerging as a critical concern in the reliability of general-purpose microprocessors. As architectural trends point toward multicore designs, there is substantial interest in adapting such parallel hardware resources for transient fault tolerance. This paper presents process-level redundancy (PLR), a software technique for transient fault tolerance, which leverages multiple cores for low overhead. PLR creates a set of redundant processes per application process and systematically compares the processes to guarantee correct execution. Redundancy at the process level allows the operating system to freely schedule the processes across all available hardware resources. PLR uses a software-centric approach to transient fault tolerance, which shifts the focus from ensuring correct hardware execution to ensuring correct software execution. As a result, many benign faults that do not propagate to affect program correctness can be safely ignored. A real prototype is presented that is designed to be transparent to the application and can run on general-purpose single-threaded programs without modifications to the program, operating system, or underlying hardware. The system is evaluated for fault coverage and performance on a four-way SMP machine and provides improved performance over existing software transient fault tolerance techniques with a 16.9 percent overhead for fault detection on a set of optimized SPEC2000 binaries.
机译:瞬态故障已成为通用微处理器可靠性中的关键问题。随着体系结构趋势朝着多核设计的方向发展,人们对将此类并行硬件资源用于瞬态容错具有极大的兴趣。本文介绍了过程级冗余(PLR),这是一种用于瞬时故障容错的软件技术,该技术利用多核来降低开销。 PLR为每个应用程序进程创建一组冗余进程,并系统地比较这些进程以确保正确执行。进程级别的冗余允许操作系统在所有可用硬件资源上自由调度进程。 PLR使用以软件为中心的方法来实现瞬时故障容错,这将重点从确保正确的硬件执行转移到了确保正确的软件执行。结果,可以安全地忽略许多不会传播而影响程序正确性的良性故障。提出了一个真实的原型,该原型设计为对应用程序透明,并且可以在通用单线程程序上运行,而无需修改程序,操作系统或基础硬件。该系统在四向SMP机器上进行了故障覆盖率和性能评估,并提供了优于现有软件瞬态容错技术的性能,并且在一组优化的SPEC2000二进制文件上的故障检测开销为16.9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号