首页> 外文会议>Sixth IEEE International Symposium on Cluster Computing and the Grid Workshops >Byzantine Anomaly Testing for Charm++: Providing Fault Tolerance and Survivability for Charm++ Empowered Clusters
【24h】

Byzantine Anomaly Testing for Charm++: Providing Fault Tolerance and Survivability for Charm++ Empowered Clusters

机译:Charm ++的拜占庭异常测试:为具有Charm ++的集群提供容错性和生存能力

获取原文

摘要

Recently shifts in high-performance computing have increased the use of clusters built around cheap commodity processors. A typical cluster consists of individual nodes, containing one or several processors, connected together with a high-bandwidth, low-latency interconnect. There are many benefits to using clusters for computation, but also some drawbacks, including a tendency to exhibit low Mean Time To Failure (MTTF) due to the sheer number of components involved. Recently, a number of fault-tolerance techniques have been proposed and developed to mitigate the inherent unreliability of clusters. These techniques, however, fail to address the issue of detecting non-obvious faults, particularly Byzantine faults. At present, effectively detecting Byzantine faults is an open problem. We describe the operation of ByzwATCh, a module for run-time detecting Byzantine hardware errors as part of the Charm++ parallel programming framework
机译:最近,高性能计算的转变已增加了围绕廉价商品处理器构建的集群的使用。典型的群集由单个节点组成,这些节点包含一个或几个处理器,并通过高带宽,低延迟的互连连接在一起。使用群集进行计算有很多好处,但也有一些缺点,包括由于涉及的组件数量过多而导致平均故障时间(MTTF)降低的趋势。最近,已经提出并开发了许多容错技术来减轻群集固有的不可靠性。然而,这些技术不能解决检测非显而易见的故障,特别是拜占庭式故障的问题。当前,有效地检测拜占庭故障是一个未解决的问题。我们描述了作为Charm ++并行编程框架一部分的,用于运行时检测拜占庭式硬件错误的模块ByzwATCh的操作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号