...
首页> 外文期刊>Foundations and trends in machine learning >Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
【24h】

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

机译:随机和非随机多臂匪问题的遗憾分析

获取原文
           

摘要

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the 1930s, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this monograph, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.
机译:多臂土匪问题是勘探与开发之间权衡取舍的顺序决策问题的最基本示例。这是在选择过去可带来最高回报的期权与探索将来可能会获得更高收益的新期权之间的平衡。尽管对强盗问题的研究可以追溯到1930年代,但在一些现代应用中,例如广告放置,网站优化和数据包路由等,都出现了探索与利用之间的权衡。从数学上讲,多武装匪徒是通过与每个选项相关的收益过程来定义的。在本专题中,我们着重讨论两个极端案例,其中对后悔的分析特别简单而优雅:i.i.d。收益和对抗性收益。除了有限数量的动作的基本设置外,我们还分析了一些最重要的变体和扩展,例如上下文盗贼模型。

著录项

  • 来源
    《Foundations and trends in machine learning》 |2012年第1期|QT061-79-2123-4345-6567-105107-115117-127|共122页
  • 作者单位

    Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA;

    Dipartimento di Informatica, Universita degli Studi di Milano, Milano 20135, Italy;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号