...
首页> 外文期刊>電子情報通信学会技術研究報告. 情報論的学習理論と機械学習 >Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation
【24h】

Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

机译:最小二乘条件密度估计的基于模型的策略梯度和基于参数的探索

获取原文
获取原文并翻译 | 示例
           

摘要

The goal of reinforcement learning (RL) is to let an agent acquire the optimal control policy in an unknown environment so that the future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is often expensive in practice. On the other hand, the model-based RL approach first estimates the transition model of the environment and then learns the policy based on the estimated transition model. Thus, if the transition model is accurately learned from a small amount of data, the model-based approach can perform better than the model-free approach. In this paper, we propose a novel model-based RL method by combining a recently proposed model-free policy search method called the policy gradients with parameter-based exploration and the state-of-the-art transition model estimator called least-squares conditional density estimation. Through experiments, we demonstrate the usefulness of the proposed method.
机译:强化学习(RL)的目标是让代理在未知环境中获得最佳控制策略,从而使未来的预期收益最大化。无模型的RL方法基于数​​据样本直接学习策略。尽管使用许多样本可以提高策略学习的准确性,但实际上收集大量样本通常很昂贵。另一方面,基于模型的RL方法首先估算环境的过渡模型,然后基于估算的过渡模型学习策略。因此,如果从少量数据中准确地学习了过渡模型,则基于模型的方法将比无模型的方法表现更好。在本文中,我们结合了最近提出的无模型策略搜索方法(称为策略梯度)和基于参数的探索,以及最新的过渡模型估计器(称为最小二乘条件),提出了一种基于模型的RL方法密度估计。通过实验,我们证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号