首页>
外国专利>
TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN FARSIGHTED BEHAVIORS BY PREDICTING IN LATENT SPACE
TRAINING REINFORCEMENT LEARNING AGENTS TO LEARN FARSIGHTED BEHAVIORS BY PREDICTING IN LATENT SPACE
展开▼
机译:培训加固学习代理通过预测潜在空间来学习远视行为
展开▼
页面导航
摘要
著录项
相似文献
摘要
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection policy neural network used to select an action to be performed by an agent interacting with an environment. In one aspect, a method includes: receiving a latent representation characterizing a current state of the environment; generating a trajectory of latent representations that starts with the received latent representation; for each latent representation in the trajectory: determining a predicted reward; and processing the state latent representation using a value neural network to generate a predicted state value; determining a corresponding target state value for each latent representation in the trajectory; determining, based on the target state values, an update to the current values of the policy neural network parameters; and determining an update to the current values of the value neural network parameters.
展开▼