首页> 外国专利> TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN FARSIGHTED BEHAVIORS BY PREDICTING IN LATENT SPACE

TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN FARSIGHTED BEHAVIORS BY PREDICTING IN LATENT SPACE

机译：培训加固学习代理通过预测潜在空间来学习远视行为

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network used to select an action to be performed by an agent interacting with an environment. In one aspect, a method includes: receiving a latent representation characterizing a current state of the environment; generating a trajectory of latent representations that starts with the received latent representation; for each latent representation in the trajectory: determining a predicted reward; and processing the state latent representation using a value neural network to generate a predicted state value; determining a corresponding target state value for each latent representation in the trajectory; determining, based on the target state values, an update to the current values of the policy neural network parameters; and determining an update to the current values of the value neural network parameters.

机译：方法，系统和设备，包括在计算机存储介质上编码的计算机程序，用于训练用于通过与环境交互执行的代理执行要执行的动作的动作选择策略神经网络。在一个方面，一种方法包括：接收表征环境的当前状态的潜在表示;生成以接受的潜在表示开始的潜在表示的轨迹;对于轨迹中的每个潜在表示：确定预测的奖励;使用值神经网络处理状态潜表示以生成预测状态值;确定轨迹中每个潜在表示的相应目标状态值;基于目标状态值确定策略神经网络参数的当前值的更新;并确定对值神经网络参数的当前值的更新。

著录项

公开/公告号US2021158162A1

专利类型
公开/公告日2021-05-27

原文格式PDF
申请/专利权人 GOOGLE LLC;
展开▼

申请/专利号US202017103827
发明设计人 DANIJAR HAFNER;MOHAMMAD NOROUZI;TIMOTHY PAUL LILLICRAP;
展开▼

申请日2020-11-24
分类号G06N3/08;G06K9/62;G06F30/27;
国家 US
入库时间 2022-08-24 18:55:08

相似文献

专利
外文文献
中文文献