首页>
外国专利>
Method and Apparatus for Analyzing Error Conditions in a Massively Parallel Computer System by Identifying Anomalous Nodes Within a Communicator Set
Method and Apparatus for Analyzing Error Conditions in a Massively Parallel Computer System by Identifying Anomalous Nodes Within a Communicator Set
展开▼
机译:通过识别通信器集中的异常节点来分析大规模并行计算机系统中的错误情况的方法和设备
展开▼
页面导航
摘要
著录项
相似文献
摘要
An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group
展开▼