Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Sebastien Bubeck; Nicolo Cesa-Bianchi

首页> 外文期刊>Foundations and trends in machine learning >Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

【24h】

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

机译：随机和非随机多臂匪问题的遗憾分析

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the 1930s, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this monograph, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.

机译：多臂土匪问题是勘探与开发之间权衡取舍的顺序决策问题的最基本示例。这是在选择过去可带来最高回报的期权与探索将来可能会获得更高收益的新期权之间的平衡。尽管对强盗问题的研究可以追溯到1930年代，但在一些现代应用中，例如广告放置，网站优化和数据包路由等，都出现了探索与利用之间的权衡。从数学上讲，多武装匪徒是通过与每个选项相关的收益过程来定义的。在本专题中，我们着重讨论两个极端案例，其中对后悔的分析特别简单而优雅：i.i.d。收益和对抗性收益。除了有限数量的动作的基本设置外，我们还分析了一些最重要的变体和扩展，例如上下文盗贼模型。

著录项

来源
《Foundations and trends in machine learning》 |2012年第1期|QT061-79-2123-4345-6567-105107-115117-127|共122页
作者
Sebastien Bubeck; Nicolo Cesa-Bianchi;
展开▼
作者单位

Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA;

Dipartimento di Informatica, Universita degli Studi di Milano, Milano 20135, Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. NONSTOCHASTIC MULTI-ARMED BANDITS WITH GRAPH-STRUCTURED FEEDBACK [J] . Alon Noga, Cesa-Bianchi Nicolo, Gentile Claudio, SIAM Journal on Computing . 2017,第6期

机译：具有图形结构反馈的非旋转多武装匪
2. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem [J] . Peter Auer, Ronald Ortner Periodica Mathematica Hungarica . 2010,第s1a2期

机译：重新讨论了UCB：改进了随机多武装匪徒问题的后悔界限
3. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem [J] . Auer P., Ortner R. Periodica Mathematica Hungarica: Journal of the Janos Bolyai Mathematical Society . 2010,第1a2期

机译：重新讨论了UCB：改进了随机多武装匪徒问题的后悔界限
4. Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits [C] . Yogev Bar-On, Yishay Mansour Conference on Neural Information Processing Systems . 2020

机译：个人遗憾的合作非旋偶性多武装匪徒
5. From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits [D] . Huang, Kuan-Sung. 2021

机译：从随机多武装匪中的低遗憾算法到低遗憾算法
6. An Analysis of the Value of Information When Exploring Stochastic Discrete Multi-Armed Bandits [O] . Isaac J. Sledge, José C. Príncipe 2018

机译：探索随机离散多武装匪徒信息的价值分析
7. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [O] . 2016

机译：随机和非随机多臂强盗问题的后悔分析

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

摘要

著录项

相似文献

相关主题

期刊订阅