首页> 外国专利> SYSTEMS AND METHODS FOR FAULT TOLERANCE RECOVER DURING TRAINING OF A MODEL OF A CLASSIFIER USING A DISTRIBUTED SYSTEM

SYSTEMS AND METHODS FOR FAULT TOLERANCE RECOVER DURING TRAINING OF A MODEL OF A CLASSIFIER USING A DISTRIBUTED SYSTEM

机译:使用分布式系统训练分类器模型期间的容错恢复的系统和方法

摘要

A distributed system for training a classifier is provided. The system comprises machine learning (ML) workers and a parameter server (PS). The PS is configured for parallel processing to provide the model to each of the ML workers, receive model updates from each of the ML workers, and iteratively update the model using each model update. The PS contains gradient datasets associated with a respective ML worker, for storing a model-update-identification (delta-M-ID) indicative of the computed model update and the respective model update, a global dataset that stores, the delta-M-ID, an identification of the ML worker (ML-worker-ID) that computed the model update, and a model version that marks a new model in PS that is computed from merging the model update with a previous model in PS; and a model download dataset that stores the ML-worker-ID and the model version of each transmitted model.
机译:提供了用于训练分类器的分布式系统。该系统包括机器学习(ML)工作人员和参数服务器(PS)。将PS配置为进行并行处理,以将模型提供给每个ML工人,从每个ML工人接收模型更新,并使用每次模型更新迭代更新模型。 PS包含与相应的ML工作程序关联的梯度数据集,用于存储表示计算的模型更新和相应的模型更新的模型更新标识(delta-M-ID),存储的全局数据集,delta-M- ID,计算模型更新的ML工人的标识(ML-worker-ID),以及标记PS中新模型的模型版本,该新模型是通过将模型更新与PS中的先前模型合并而得出的;模型下载数据集,其中存储了ML-worker-ID和每个传输模型的模型版本。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号