首页> 外文学位 >Policy advice, non-convex and distributed optimization in reinforcement learning
【24h】

Policy advice, non-convex and distributed optimization in reinforcement learning

机译:强化学习中的政策建议,非凸和分布式优化

获取原文
获取原文并翻译 | 示例

摘要

Transfer learning is a method in machine learning that tries to use previous training knowledge to speed up the learning process. Policy advice is a type of transfer learning method where a student agent is able to learn faster via advice from a teacher agent. Here, the agent who provides advice (actions) is called the teacher agent. The agent who receives advice (actions) is the student agent. However, both this and other current reinforcement learning transfer methods have little theoretical analysis. This dissertation formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and the teacher's advice. Regret bounds are provided and negative transfer is formally defined and studied.;On the other hand, policy search is a class of reinforcement learning algorithms for finding optimal policies to control problems with limited feedback. These methods have shown successful applications in high-dimensional problems, such as robotics control. Though successful, current methods can lead to unsafe policy parameters damaging hardware units. Motivated by such constraints, Bhatnagar et al. and others proposed projection based methods for safe policies [8]. These methods, however, can only handle convex policy constraints. In this dissertation, we contribute the first safe policy search reinforcement learner capable of operating under non-convex policy constraints. This is achieved by observing a connection between non-convex variational inequalities and policy search problems. We provide two algorithms, i.e., Mann and two-step iteration, to solve the above and prove convergence in the nonconvex stochastic setting.;Lastly, lifelong reinforcement learning is a framework similar to transfer learning that allows agents to learn multiple consecutive tasks sequentially online. Current methods, however, suffer from scalability issues when the agent has to solve a large number of tasks. In this dissertation, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange.
机译:转移学习是机器学习中的一种方法,尝试使用以前的培训知识来加快学习过程。政策建议是一种转移学习的方法,学生代理人可以通过老师的建议来更快地学习。在这里,提供建议(动作)的代理人称为教师代理人。接收建议(操作)的代理是学生代理。但是,这种方法和其他当前的强化学习转移方法都缺乏理论分析。本论文正式定义了多个教师代理可以为学生提供建议的环境,并介绍了一种算法,可以利用自主探索和教师建议。提供了遗憾的界限,并正式定义和研究了负转移。另一方面,策略搜索是一类强化学习算法,用于寻找最优策略来控制反馈有限的问题。这些方法已在诸如机器人控制之类的高维问题中显示出成功的应用。尽管成功,但当前的方法可能导致不安全的策略参数损坏硬件单元。受这种限制的驱使,Bhatnagar等人。其他人提出了基于投影的安全政策方法[8]。但是,这些方法只能处理凸出的政策约束。在本文中,我们贡献了第一个能够在非凸策略约束下运行的安全策略搜索强化学习器。这是通过观察非凸变分不平等与策略搜索问题之间的联系来实现的。我们提供了曼恩算法和两步迭代算法两种方法来解决上述问题,并证明了在非凸型随机环境下的收敛性。最后,终生强化学习是一种类似于转移学习的框架,该框架使代理能够依次在线学习多个连续任务。但是,当代理必须解决大量任务时,当前方法会遇到可伸缩性问题。在本文中,我们弥补了上述缺点,提出了一种新型的可扩展的终身强化学习技术。我们推导了一种算法,该算法假定多个处理单元的可用性,并且仅使用本地信息交换来计算共享存储库和本地策略。

著录项

  • 作者

    Zhan, Yusen.;

  • 作者单位

    Washington State University.;

  • 授予单位 Washington State University.;
  • 学科 Computer science.;Artificial intelligence.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 147 p.
  • 总页数 147
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号