首页> 外国专利> SYSTEMS AND METHODS FOR FAULT TOLERANCE RECOVER DURING TRAINING OF A MODEL OF A CLASSIFIER USING A DISTRIBUTED SYSTEM

SYSTEMS AND METHODS FOR FAULT TOLERANCE RECOVER DURING TRAINING OF A MODEL OF A CLASSIFIER USING A DISTRIBUTED SYSTEM

机译：使用分布式系统训练分类器模型期间的容错恢复的系统和方法

页面导航

摘要
著录项
相似文献

摘要

A distributed system for training a classifier is provided. The system comprises machine learning (ML) workers and a parameter server (PS). The PS is configured for parallel processing to provide the model to each of the ML workers, receive model updates from each of the ML workers, and iteratively update the model using each model update. The PS contains gradient datasets associated with a respective ML worker, for storing a model-update-identification (delta-M-ID) indicative of the computed model update and the respective model update, a global dataset that stores, the delta-M-ID, an identification of the ML worker (ML-worker-ID) that computed the model update, and a model version that marks a new model in PS that is computed from merging the model update with a previous model in PS; and a model download dataset that stores the ML-worker-ID and the model version of each transmitted model.

机译：提供了用于训练分类器的分布式系统。该系统包括机器学习（ML）工作人员和参数服务器（PS）。将PS配置为进行并行处理，以将模型提供给每个ML工人，从每个ML工人接收模型更新，并使用每次模型更新迭代更新模型。 PS包含与相应的ML工作程序关联的梯度数据集，用于存储表示计算的模型更新和相应的模型更新的模型更新标识（delta-M-ID），存储的全局数据集，delta-M- ID，计算模型更新的ML工人的标识（ML-worker-ID），以及标记PS中新模型的模型版本，该新模型是通过将模型更新与PS中的先前模型合并而得出的;模型下载数据集，其中存储了ML-worker-ID和每个传输模型的模型版本。

著录项

公开/公告号US2019220758A1

专利类型
公开/公告日2019-07-18

原文格式PDF
申请/专利权人 HUAWEI TECHNOLOGIES CO. LTD.;
展开▼

申请/专利号US201916363639
发明设计人 ROMAN TALYANSKY;ZACH MELAMED;NATAN PETERFREUND;ZUGUANG WU;
展开▼

申请日2019-03-25
分类号G06N5/04;G06N20/20;G06K9/62;G06F17/18;
国家 US
入库时间 2022-08-21 12:11:02

相似文献

专利
外文文献
中文文献