首页>
外国专利>
System and method for detecting process and network failures in a distributed system having multiple independent networks
System and method for detecting process and network failures in a distributed system having multiple independent networks
展开▼
机译:在具有多个独立网络的分布式系统中检测过程和网络故障的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention provides a system and method of detecting a process failure and a network failure in a distributed system. The distributed system includes at least two processes, each executing on a host, operable to transmit messages (i.e., heartbeats) to each other on a plurality of networks in the distributed system. A process in the system is operable to execute a network failure algorithm for detecting failure of a network in the system. The process failure algorithm includes calculating a difference in the period of time to receive a heartbeat on a first network from a processes and a period of time to receive a heartbeat on a second network from the process. If the difference exceeds a network failure threshold, the second network is suspected of failing. A process in the system is also operable to execute a process failure algorithm. The process failure algorithm includes detecting receipt of a heartbeat from a process on any one of a plurality of networks in the system within a network failure time limit. If a heartbeat is not received on any of the networks, the process is suspected of failing.
展开▼