首页> 外文会议>Annual conference on Neural Information Processing Systems >Sequential Transfer in Multi-armed Bandit with Finite Set of Models
【24h】

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

机译:具有有限模型的多武装强盗中的顺序转移

获取原文

摘要

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of sequential transfer in online learning, notably in the multi-armed bandit framework, where the objective is to minimize the total regret over a sequence of tasks by transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for estimating the possible tasks and derive regret bounds for it.
机译:从先前的任务中学习并转移该体验以提高未来的业绩对于建设终身学习代理至关重要。虽然导致监督和强化学习表明,转让可能会显着提高学习绩效,但大多数文献转移都会致力于批量学习任务。在本文中,我们研究了在线学习中顺序转移的问题,特别是在多武装强盗框架中,目的是通过从先前任务转移知识来最小化通过传输知识的一系列任务序列的总遗憾。我们介绍了一种新的强盗算法,基于一种矩阵方法,用于估计可能的任务并导出遗憾的界限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号