Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

Syogo Mori; Voot Tangkaratt; Tingting Zhao; Jun Morimoto; Masashi Sugiyama

首页> 外文期刊>電子情報通信学会技術研究報告. 情報論的学習理論と機械学習 >Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

【24h】

Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

机译：最小二乘条件密度估计的基于模型的策略梯度和基于参数的探索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The goal of reinforcement learning (RL) is to let an agent acquire the optimal control policy in an unknown environment so that the future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is often expensive in practice. On the other hand, the model-based RL approach first estimates the transition model of the environment and then learns the policy based on the estimated transition model. Thus, if the transition model is accurately learned from a small amount of data, the model-based approach can perform better than the model-free approach. In this paper, we propose a novel model-based RL method by combining a recently proposed model-free policy search method called the policy gradients with parameter-based exploration and the state-of-the-art transition model estimator called least-squares conditional density estimation. Through experiments, we demonstrate the usefulness of the proposed method.

机译：强化学习（RL）的目标是让代理在未知环境中获得最佳控制策略，从而使未来的预期收益最大化。无模型的RL方法基于数据样本直接学习策略。尽管使用许多样本可以提高策略学习的准确性，但实际上收集大量样本通常很昂贵。另一方面，基于模型的RL方法首先估算环境的过渡模型，然后基于估算的过渡模型学习策略。因此，如果从少量数据中准确地学习了过渡模型，则基于模型的方法将比无模型的方法表现更好。在本文中，我们结合了最近提出的无模型策略搜索方法（称为策略梯度）和基于参数的探索，以及最新的过渡模型估计器（称为最小二乘条件），提出了一种基于模型的RL方法密度估计。通过实验，我们证明了该方法的有效性。

著录项

来源
《電子情報通信学会技術研究報告. 情報論的学習理論と機械学習》 |2012年第454期|共8页
作者
Syogo Mori; Voot Tangkaratt; Tingting Zhao; Jun Morimoto; Masashi Sugiyama;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类机械学（机械设计基础理论）;信息理论;
关键词
Model-based reinforcement learning; Policy search; Policy gradients with parameter-based exploration; Least-squares conditional density estimation; Robot control;

机译：基于模型的强化学习;策略搜索;基于参数的探索的策略梯度;最小二乘条件密度估计;机器人控制;

相似文献

外文文献
中文文献
专利

1. Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation [J] . VootTangkaratt, Syogo Mori, Tingting Zhao, Neural Networks: The Official Journal of the International Neural Network Society . 2014,第Null期

机译：最小二乘条件密度估计的基于模型的策略梯度与基于参数的探索
2. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation [J] . Syogo Mori, Voot Tangkaratt, Tingting Zhao, 電子情報通信学会技術研究報告 . 2013,第454期

机译：最小二乘条件密度估计的基于模型的策略梯度与基于参数的探索
3. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation [J] . Syogo Mori, Voot Tangkaratt, Tingting Zhao, 電子情報通信学会技術研究報告. 情報論的学習理論と機械学習 . 2012,第454期

机译：最小二乘条件密度估计的基于模型的策略梯度和基于参数的探索
4. A Policy Gradient with Parameter-Based Exploration Approach for Zone-Heating [C] . Kevin Van Vaerenbergh, Yann-Michael De Hauwere, Bruno Depraetere, IEEE Symposium Series on Computational Intelligence . 2015

机译：基于参数的区域加热探索策略梯度
5. Two-Stage Conditional Density Estimation Based on Bernstein Polynomials [D] . Lyu, Guanjie. 2020

机译：基于伯恩斯坦多项式的两阶段条件密度估计
6. Prediction model-based kernel density estimation when group membership is subject to missing [O] . Hua He, Wenjuan Wang, Wan Tang -1

机译：缺少组成员身份时基于预测模型的内核密度估计
7. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation [O] . Syogo Mori, Voot Tangkaratt, Tingting Zhao, 2016

机译：基于参数的最小二乘条件密度估计的基于模型的策略梯度
8. Least-Squares Conditional Estimation of the Location Parameter of Weibull Populations [R] . Herman, W. J. 1968

机译：Weibull种群位置参数的最小二乘条件估计

Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

摘要

著录项

相似文献

相关主题

期刊订阅