首页> 外文会议>International Conference on Machine Learning >A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits (extended version)

【24h】

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits (extended version)

机译：基于普发的实用性的决斗匪徒的一个相对指数称重算法（扩展版）

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose an efficient algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. We prove a finite time expected regret upper bound of order O({the square root of}(K ln(K)T)) for this algorithm and a general lower bound of order Ω({the square root of}KT). At the end, we provide experimental results using real data from information retrieval applications.

机译：我们研究了K武装的决斗强盗问题，这是经典多武装强盗（MAB）问题的变型，其中学习者仅接收关于所选择的臂的相对反馈。我们提出了一种称为相对指数重量算法的高效算法，用于勘探和开发（REX3），以处理这个问题的基于对抗的实用程序的制定。对于该算法，我们证明了订单O（{}（kln（k）（k）t）的平方根）的有限时间后悔上限，并且ω（{kt的平方根）的普通下限和一般下限。最后，我们提供了使用来自信息检索应用的真实数据的实验结果。

著录项

来源
《International Conference on Machine Learning》|2016年||共19页
会议地点
作者
Pratik Gajane; Tanguy Urvoy; Fabrice Clerot;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词

相似文献

外文文献
中文文献
专利

1. Efficient and robust algorithms for adversarial linear contextual bandits [J] . Gergely Neu, Julia Olkhovskaya JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：对抗性线性上下围匪徒的高效和鲁棒算法
2. Minimax Optimal Algorithms for Adversarial Bandit Problem With Multiple Plays [J] . Vural Nuri Mert, Gokcesu Hakan, Gokcesu Kaan, IEEE Transactions on Signal Processing . 2019,第16期

机译：具有多重博弈的对抗性强盗问题的Minimax最优算法
3. More Adaptive Algorithms for Adversarial Bandits [J] . Chen-Yu Wei, Haipeng Luo JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：对抗强盗的更多自适应算法
4. A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits (extended version) [C] . Pratik Gajane, Tanguy Urvoy, Fabrice Clerot International Conference on Machine Learning . 2016

机译：基于普发的实用性的决斗匪徒的一个相对指数称重算法（扩展版）
5. A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits [O] . Gajane Pratik, Urvoy Tanguy, Clérot Fabrice 2015

机译：基于对抗效用的决斗强盗相对指数权重算法

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits (extended version)

摘要

著录项

相似文献

相关主题

期刊订阅