Quantile-Regret Minimisation in Infinitely Many-Armed Bandits

机译：无限多武装匪徒的分体遗憾最小化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The stochastic multi-armed bandit is a well-studied abstraction of decision making in the face of uncertainty. We consider the setting in which the number of bandit arms is much larger than the possible number of pulls, and can even be infinite. With the aim of minimising regret with respect to an optimal arm, existing methods for this setting either assume some structure over the set of arms (Kleinberg et al., 2008, Ray Chowdhury and Gopalan, 2017), or some property of the reward distribution (Wang et al., 2008). Invariably, the validity of such assumptions - and therefore the performance of the corresponding methods-depends on instance-specific parameters, which might not be known beforehand. We propose a conceptually simple, parameter-free, and practically effective alternative. Specifically we introduce a notion of regret with respect to the top quantile of a probability distribution over the expected reward of randomly drawn arms. Our main contribution is an algorithm that achieves sublinear "quantile-regret", both (1) when it is specified a quantile, and (2) when the quantile can be any (unknown) positive value. The algorithm needs no side information about the arms or about the structure of their reward distributions: it relies on random sampling to reach arms in the top quantile. Experiments show that our algorithm outperforms several previous methods (in terms of conventional regret) when the latter are not tuned well, and often even when they are.

机译：随机多武装强盗是在不确定性面前决策的精彩抽象。我们认为强盗臂的数量远远大于可能的拉动数量的设置，甚至可以是无限的。旨在最大限度地减少关于最佳臂的遗憾，该设置的现有方法要么承担一组武器（Kleinberg等，2008，Ray Chowdhury，Gopalan，2017）或奖励分配的一些财产（Wang等，2008）。总是，这种假设的有效性 - 因此，相应的方法的性能取决于特定于实例的参数，这些参数可能不会预先知道。我们提出了一个概念上简单，无参数和实际有效的替代方案。具体而言，我们对随机绘制武器的预期奖励的概率分布的顶部定量介绍了遗憾的概念。我们的主要贡献是一种算法，可以实现索姆林“分位式遗憾”，两者（1）指定量子，（2）当分量可以是任何（未知）正值时。该算法不需要有关武器的侧面信息或关于其奖励分布的结构：它依赖于随机采样来达到顶部定位的臂。实验表明，当后者未良好调整时，我们的算法优于几种以前的方法（在传统遗憾方面），通常即使它们是。

著录项

来源
《Conference on Uncertainty in Artificial Intelligence》|2018年|539p|共10页
会议地点
作者
Arghya Roy Chaudhuri; Shivaram Kalyanakrishnan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Pure Exploration in Infinitely-Armed Bandit Models with Fixed-Confidence [J] . Maryam Aziz, Jesse Anderton, Emilie Kaufmann, JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：无限武装匪徒模型的纯粹探索，固定信道
2. Optimal Bayesian strategies for the infinite-armed Bernoulli bandit [J] . Hung Y.-C. Journal of Statistical Planning and Inference . 2012,第1期

机译：无限武装伯努利匪徒的最佳贝叶斯策略
3. A note on infinite-armed Bernoulli bandit problems with generalized beta prior distributions [J] . Kung-Yu Chen, Chien-Tai Lin Statistical papers . 2005,第1期

机译：关于具有广义beta先验分布的无限武装的Bernoulli强盗问题的注释
4. Quantile-Regret Minimisation in Infinitely Many-Armed Bandits [C] . Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan Conference on Uncertainty in Artificial Intelligence . 2018

机译：无限多武装匪徒的分体遗憾最小化
5. Adaptive Preference Learning with Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration [D] . Chen, Bangrui. 2017

机译：带有土匪反馈的自适应偏好学习：信息过滤，决斗土匪和激励探索
6. Smoking and the bandit: A preliminary study of smoker and non-smoker differences in exploratory behavior measured with a multi-armed bandit task [O] . Merideth A. Addicott, John M. Pearson, Jessica Wilson, -1

机译：吸烟和强盗：用多武装强盗任务测量的探索性行为的吸烟者和非吸烟者差异的初步研究
7. Infinitely many-armed bandits [O] . Wang Yizao, Audibert Jean-Yves, Munos Rémi 2008

机译：无限武装的匪徒
8. Determination of Minimised Kt Values and Boundary Shapes for a Class of Quasi-Rectangular Holes in Infinite Plates. [R] . Waldman, W. 2015

机译：无限大板中一类准矩形孔的最小Kt值和边界形状的确定。

Quantile-Regret Minimisation in Infinitely Many-Armed Bandits

摘要

著录项

相似文献

相关主题

期刊订阅