Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

机译：参数化模型中的有效强化学习：离散参数案例

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider reinforcement learning in the parameterized setup, where the model is known to belong to a finite set of Markov Decision Processes (MDPs) under the discounted return criteria. We propose an on-line algorithm for learning in such parameterized models, the Parameter Elimination (PEL) algorithm, and analyze its performance in terms of the total mistake bound criterion. The algorithm relies on Wald's sequential probability ratio test to eliminate unlikely parameters, and uses an optimistic policy for effective exploration. We establish that, with high probability, the total mistake bound for the algorithm is linear (up to a logarithmic term) in the size |Θ| of the parameter space, independently of the cardinality of the state and action spaces. We further demonstrate that much better dependence on |Θ| is possible, depending on the specific information structure of the problem.

机译：我们考虑在参数化设置中进行强化学习，其中已知模型在折现收益标准下属于有限的Markov决策过程（MDP）集。我们提出一种用于在这种参数化模型中进行学习的在线算法，即参数消除（PEL）算法，并根据总错误界限准则分析其性能。该算法依靠Wald的顺序概率比检验来消除不太可能的参数，并使用乐观策略进行有效探索。我们确定，该算法的总错误界限很有可能是|Θ|大小的线性（最多对数项）。参数空间的大小，与状态和动作空间的基数无关。我们进一步证明，对|Θ|的依赖性更好。根据问题的具体信息结构，是否可行。

著录项

来源
《Recent advances in reinforcement learning》|2008年|41-54|共14页
会议地点 Villeneuve dAscq(FR);Villeneuve dAscq(FR)
作者
Kirill Dyagilev; Shie Mannor; Nahum Shimkin;
展开▼
作者单位

Department of EE, Technion, Haifa, Israel;

Department of ECE, McGill University, Montreal, Canada;

Department of EE, Technion, Haifa, Israel;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Control of Nonaffine Nonlinear Discrete-Time Systems Using Reinforcement-Learning-Based Linearly Parameterized Neural Networks [J] . Yang Q., Vance J. B., Jagannathan S. IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics . 2008,第4期

机译：基于强化学习的线性参数化神经网络控制非仿射非线性离散系统
2. Adaptive Fault-Tolerant Tracking Control for MIMO Discrete-Time Systems via Reinforcement Learning Algorithm With Less Learning Parameters [J] . Lei Liu, Zhanshan Wang, Huaguang Zhang Automation Science and Engineering, IEEE Transactions on . 2017,第1期

机译：通过具有较少学习参数的强化学习算法对MIMO离散时间系统进行自适应容错跟踪控制
3. Reinforcement Learning Design-Based Adaptive Tracking Control With Less Learning Parameters for Nonlinear Discrete-Time MIMO Systems [J] . Liu Y., Tang L., Tong S., Neural Networks and Learning Systems, IEEE Transactions on . 2015,第1期

机译：非线性离散MIMO系统中基于强化学习设计且学习参数较少的自适应跟踪控制
4. Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case [C] . Kirill Dyagilev, Shie Mannor, Nahum Shimkin European Workshop on Reinforcement Learning . 2008

机译：参数化模型中有效的强化学习：离散参数案例
5. Understanding Model-Based Reinforcement Learning and its Application in Safe Reinforcement Learning [D] . Hu, Dingcheng . 2019

机译：了解基于模型的强化学习及其在安全强化学习中的应用
6. Efficient reinforcement learning of a reservoir network model of parametric working memory achieved with a cluster population winner-take-all readout mechanism [O] . Zhenbo Cheng, Zhidong Deng, Xiaolin Hu, -1

机译：利用集群总体赢家通吃的读出机制实现的参数化工作记忆库网络模型的高效强化学习
7. Efficient reinforcement learning in parameterized models: Discrete parameter case [O] . Kirill Dyagilev, Shie Mannor, Nahum Shimkin 2013

机译：参数化模型中的高效强化学习：离散参数案例

Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

摘要

著录项

相似文献

相关主题

期刊订阅