首页> 外文学位 >Policy advice, non-convex and distributed optimization in reinforcement learning

【24h】

Policy advice, non-convex and distributed optimization in reinforcement learning

机译：强化学习中的政策建议，非凸和分布式优化

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Transfer learning is a method in machine learning that tries to use previous training knowledge to speed up the learning process. Policy advice is a type of transfer learning method where a student agent is able to learn faster via advice from a teacher agent. Here, the agent who provides advice (actions) is called the teacher agent. The agent who receives advice (actions) is the student agent. However, both this and other current reinforcement learning transfer methods have little theoretical analysis. This dissertation formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and the teacher's advice. Regret bounds are provided and negative transfer is formally defined and studied.;On the other hand, policy search is a class of reinforcement learning algorithms for finding optimal policies to control problems with limited feedback. These methods have shown successful applications in high-dimensional problems, such as robotics control. Though successful, current methods can lead to unsafe policy parameters damaging hardware units. Motivated by such constraints, Bhatnagar et al. and others proposed projection based methods for safe policies [8]. These methods, however, can only handle convex policy constraints. In this dissertation, we contribute the first safe policy search reinforcement learner capable of operating under non-convex policy constraints. This is achieved by observing a connection between non-convex variational inequalities and policy search problems. We provide two algorithms, i.e., Mann and two-step iteration, to solve the above and prove convergence in the nonconvex stochastic setting.;Lastly, lifelong reinforcement learning is a framework similar to transfer learning that allows agents to learn multiple consecutive tasks sequentially online. Current methods, however, suffer from scalability issues when the agent has to solve a large number of tasks. In this dissertation, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange.

机译：转移学习是机器学习中的一种方法，尝试使用以前的培训知识来加快学习过程。政策建议是一种转移学习的方法，学生代理人可以通过老师的建议来更快地学习。在这里，提供建议（动作）的代理人称为教师代理人。接收建议（操作）的代理是学生代理。但是，这种方法和其他当前的强化学习转移方法都缺乏理论分析。本论文正式定义了多个教师代理可以为学生提供建议的环境，并介绍了一种算法，可以利用自主探索和教师建议。提供了遗憾的界限，并正式定义和研究了负转移。另一方面，策略搜索是一类强化学习算法，用于寻找最优策略来控制反馈有限的问题。这些方法已在诸如机器人控制之类的高维问题中显示出成功的应用。尽管成功，但当前的方法可能导致不安全的策略参数损坏硬件单元。受这种限制的驱使，Bhatnagar等人。其他人提出了基于投影的安全政策方法[8]。但是，这些方法只能处理凸出的政策约束。在本文中，我们贡献了第一个能够在非凸策略约束下运行的安全策略搜索强化学习器。这是通过观察非凸变分不平等与策略搜索问题之间的联系来实现的。我们提供了曼恩算法和两步迭代算法两种方法来解决上述问题，并证明了在非凸型随机环境下的收敛性。最后，终生强化学习是一种类似于转移学习的框架，该框架使代理能够依次在线学习多个连续任务。但是，当代理必须解决大量任务时，当前方法会遇到可伸缩性问题。在本文中，我们弥补了上述缺点，提出了一种新型的可扩展的终身强化学习技术。我们推导了一种算法，该算法假定多个处理单元的可用性，并且仅使用本地信息交换来计算共享存储库和本地策略。

著录项

作者
Zhan, Yusen.;
展开▼
作者单位

Washington State University.;

展开▼
授予单位 Washington State University.;
学科 Computer science.;Artificial intelligence.
学位 Ph.D.
年度 2016
页码 147 p.
总页数 147
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Reinforcement learning of motor skills using Policy Search and human corrective advice [J] . The International journal of robotics research . 2019,第14期

机译：使用策略搜索和人工纠正建议加强运动技能的学习
2. Formula-E race strategy development using distributed policy gradient reinforcement learning [J] . Liu Xuze, Fotouhi Abbas, Auger Daniel J. Knowledge-Based Systems . 2021,第Mara15期

机译：公式 - 竞争战略开发采用分布式政策梯度加固学习
3. Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems [J] . Ren He, Dai Jing, Zhang Huaguang, Transactions of the Institute of Measurement and Control . 2020,第15期

机译：非线性分布式参数系统非零和游戏处理非零综合加固学习算法
4. Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization : Invited Presentation [C] . Kaiqing Zhang, Alec Koppel, Hao Zhu, Annual Conference on Information Sciences and Systems . 2019

机译：无限视野折扣强化学习中的策略搜索：通过与非凸优化的连接而取得的进步：特邀演讲
5. Learning Policies for Model-Based Reinforcement Learning Using Distributed Reward Formulation [D] . Agarwal, Nikhil. 2021

机译：使用分布式奖励制定学习基于模型的强化学习的政策
6. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning [O] . Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya 2020

机译：利用等级强化学习的多意图对话的情感对话策略学习
7. Feature selection and policy optimization for distributed instruction placement using reinforcement learning [O] . Katherine E. Coons, Behnam Robatmili, Matthew E. Taylor, 2008

机译：基于强化学习的分布式指令布局特征选择与策略优化

Policy advice, non-convex and distributed optimization in reinforcement learning

摘要

著录项

相似文献

相关主题

期刊订阅