首页> 外文会议>International Conference on Autonomous Agents and Multiagent Systems >An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward
【24h】

An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward

机译:知识接近最优均值奖励的随机匪徒的最优算法

获取原文

摘要

This paper studies a variation of stochastic multi-armed bandit (MAB) problem where the agent knows a prior knowledge named Near-optimal Mean Reward (NoMR). We show that the cumulative regret of this bandit variation has a lower bound of Ω(1/Δ), where Δ is the gap between the optimal and the second optimal mean reward. An algorithm called NOMR-BANDIT is proposed to this variation, and we demonstrate that the cumulative regret of NoMR-BANDIT has a uniform upper bound of O(Δ). It is concluded that NOMR-BANDIT is optimal in terms of the order of regret bounds.
机译:本文研究了随机多武装强盗(MAB)问题的变化,代理人知道近乎最佳均值奖励(NOMR)的先验知识。我们表明该强盗变化的累积遗憾具有ω(1 /δ)的下限,其中Δ是最佳和第二最佳均值奖励之间的间隙。提出了一种称为NOMR-BANDIT的算法对该变型,并且我们证明NOMR-BANDIT的累积遗憾具有O(Δ)的均匀上限。它的结论是,在遗憾界限方面,Nomr-Birtit是最佳的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号