An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward

机译：知识接近最优均值奖励的随机匪徒的最优算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper studies a variation of stochastic multi-armed bandit (MAB) problem where the agent knows a prior knowledge named Near-optimal Mean Reward (NoMR). We show that the cumulative regret of this bandit variation has a lower bound of Ω(1/Δ), where Δ is the gap between the optimal and the second optimal mean reward. An algorithm called NOMR-BANDIT is proposed to this variation, and we demonstrate that the cumulative regret of NoMR-BANDIT has a uniform upper bound of O(Δ). It is concluded that NOMR-BANDIT is optimal in terms of the order of regret bounds.

机译：本文研究了随机多武装强盗（MAB）问题的变化，代理人知道近乎最佳均值奖励（NOMR）的先验知识。我们表明该强盗变化的累积遗憾具有ω（1 /δ）的下限，其中Δ是最佳和第二最佳均值奖励之间的间隙。提出了一种称为NOMR-BANDIT的算法对该变型，并且我们证明NOMR-BANDIT的累积遗憾具有O（Δ）的均匀上限。它的结论是，在遗憾界限方面，Nomr-Birtit是最佳的。

著录项

来源
《International Conference on Autonomous Agents and Multiagent Systems》|2018年|1531-2267p|共3页
会议地点
作者
Shangdong Yang; Hao Wang; Yang Gao; Xingguo Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Multi-armed bandit problem; Learning theory; Reward structures for learning;

机译：多武装匪徒问题;学习理论;奖励学习结构;

相似文献

外文文献
中文文献
专利

1. An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward [J] . Yang Shangdong, Gao Yang Neural Networks and Learning Systems, IEEE Transactions on . 2021,第5期

机译：在知道近乎最佳均值奖励时，随机匪徒的最佳算法
2. Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms [J] . Omar Besbes Assaf Zeevi Operations Research . 2009,第6期

机译：不知道需求函数的动态定价：风险界限和接近最优的算法
3. Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms [J] . Besbes O, Zeevi A Operations Research: The Journal of the Operations Research Society of America . 2009,第6期

机译：不知道需求函数的动态定价：风险界限和近似最优算法
4. An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward [C] . Shangdong Yang, Hao Wang, Yang Gao, International Conference on Autonomous Agents and Multiagent Systems . 2018

机译：知识接近最优均值奖励的随机匪徒的最优算法
5. New near-optimal feedback guidance algorithms for space missions. [D] . Hawkins, Matthew Jay. 2013

机译：用于太空任务的新的近乎最佳的反馈制导算法。
6. Near-optimal deterministic algorithms for volume computation via M-ellipsoids [O] . Daniel Dadush, Santosh S. Vempala 2013

机译：通过M椭球体进行体积计算的近最佳确定性算法
7. Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms [O] . Besbes, Omar, Zeevi, Assaf 2009

机译：不知道需求函数的动态定价：风险界限和接近最优的算法

An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward

摘要

著录项

相似文献

相关主题

期刊订阅