首页> 外国专利> Method and Apparatus for Analyzing Error Conditions in a Massively Parallel Computer System by Identifying Anomalous Nodes Within a Communicator Set

Method and Apparatus for Analyzing Error Conditions in a Massively Parallel Computer System by Identifying Anomalous Nodes Within a Communicator Set

机译:通过识别通信器集中的异常节点来分析大规模并行计算机系统中的错误情况的方法和设备

摘要

An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group
机译:大规模并行计算机系统的分析机制会自动分析从系统中检索到的数据,并识别与其直接邻居相比表现出异常行为的节点。优选地,通过比较每个节点的呼叫返回栈回溯,将相似的节点分组在一起,并识别自身不属于该组的相邻节点,来确定异常行为。在组中具有大量邻居的节点(而不是组本身)很可能是错误的局部。分析器最好通过根据组中相邻成员的数量对邻居进行排序来向用户显示此信息。

著录项

  • 公开/公告号US2008022261A1

    专利类型

  • 公开/公告日2008-01-24

    原文格式PDF

  • 申请/专利权人 THOMAS MICHAEL GOODING;

    申请/专利号US20060425773

  • 发明设计人 THOMAS MICHAEL GOODING;

    申请日2006-06-22

  • 分类号G06F9/44;

  • 国家 US

  • 入库时间 2022-08-21 20:13:11

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号