The advent of open architectures and initiatives in massively parallel supercomputing, combined with the maturation of distributed processing methods and algorithms, has enabled the implementation of responsive software-based fault tolerance. Expanding capabilities of distributed Ada runtime environments further stimulate the incorporation of hardware fault tolerance into critical, realtime embedded systems. Through the integration of proven Ada program component distribution and virtually synchronous communication protocols, we have established a benchmark fault tolerant system, which layers transparently between an Ada application and the runtime environment. Such transparence allows rapid reconfiguration of distribution and fault tolerance characteristics without change to the source code, thus enhancing portability, scalability, and reuse.
rnThe Ada Fault Tolerance project has implemented software technologies which penetrate the envelope of an Ada program to detect, diagnose, and recover from hardware faults. These realtime facilities interact with the Rational distributed application development and runtime environment systems to service replicated Ada software tasks (i.e., threads of control). The deployed system proves that all replicated threads, including those of independently distributed components, can achieve timely consensus during periodic fault detection cycles through transparently embedded voting protocols. Our implementation uses a hybrid redundancy computation strategy and relies on a communication layer which provides virtual synchrony via a causal multicast protocol.
大规模并行超级计算中开放体系结构和计划的出现,再加上分布式处理方法和算法的成熟,使得能够实现基于响应软件的容错能力。分布式Ada运行时环境的扩展功能进一步刺激了将硬件容错能力整合到关键的实时嵌入式系统中。通过将经过验证的Ada程序组件分发和几乎同步的通信协议进行集成,我们建立了基准容错系统,该系统在Ada应用程序和运行时环境之间透明地分层。这种透明性可以在不更改源代码的情况下快速重新配置分布和容错特性,从而增强了可移植性,可伸缩性和重用性。 Ada程序可检测,诊断硬件故障并从中恢复。这些实时设施与Rational分布式应用程序开发和运行时环境系统进行交互,以服务复制的Ada软件任务(即控制线程)。部署的系统证明,所有复制的线程,包括独立分布的组件的线程,都可以通过透明嵌入的投票协议在定期的故障检测周期中及时达成共识。我们的实现采用混合冗余计算策略,并依赖于通过因果多播协议提供虚拟同步的通信层。 P>
Microelectronics and Computer Technology Corporation (MCC), 3500 West Balcones Center Drive, Austin, Texas;
Department of Electrical and Computer, Engineering, The University of Texas at Austin, Austin, Texas;
Computing Devices International, 8800 Queen Avenue South, Bloomington, Minnesota;
机译:用于并行和分布式仿真的透明三相拜占庭式容错
机译:在分布式实时系统中设计自适应容错结构
机译:分布式对象系统中应用程序级容错的性能调整策略
机译:通过透明复制为分布式Ada 95容错
机译:AQuA:一种为分布式应用程序提供自适应容错的框架。
机译:通过无线传感器网络中的容错功能优化服务组合应用程序的可靠性和性能
机译:分布式系统中的应用程序透明容错
机译:透明ada在容错分布式系统中会合