Nonparametric Stochastic Contextual Bandits

机译：非参数随机上下文匪徒

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions. We attain tight results for top-arm identification and a sublinear regret of O{top}~ (T~((1+D)/(2+D)), where D is the context dimension, for a modified UCB algorithm that is simple to implement. We then give global intrinsic dimension dependent and ambient dimension independent regret bounds. We also discuss recovering topological structures within the context space based on expected bandit performance and provide an extension to infinite-armed contextual bandits. Finally, we experimentally show the improvement of our algorithm over existing approaches for both simulated tasks and MNIST image classification.

机译：我们分析了K武装的强盗问题，其中每个臂的奖励是基于在轻度非参数假设下观察到的上下文的嘈杂实现。我们对顶部武器识别和o {top}〜（t〜（（1 + d）/（2 + d））的汇总遗憾，其中d是上下文算法，用于修改的UCB算法易于实施。然后，我们提供全局内在维度依赖性和环境维度独立的遗憾范围。我们还根据预期的匪徒性能讨论恢复上下文空间内的拓扑结构，并为无限武装的上下围匪徒提供扩展。最后，我们实验表明了通过模拟任务和Mnist图像分类的现有方法改进我们的算法。

著录项

来源
《AAAI Conference on Artificial Intelligence;Innovative Applications of Artificial Intelligence Conference;Symposium on Educational Advances in Artificial Intelligence》|2018年|2668-3603p|共7页
会议地点
作者
Melody Y. Guan; Heinrich Jiang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards [J] . Arya Sakshi, Yang Yuhong Statistics & Probability Letters . 2020,第1期

机译：随机分配与延迟奖励的上下文多武装匪徒的非参数分配
2. Generalized Policy Elimination an efficient algorithm for Nonparametric Contextual Bandits [J] . Aurelien Bibaut, Antoine Chambaz, Mark Laan JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：广义政策消除非参数上下文匪徒的有效算法
3. Contextual Bandits with Stochastic Experts [J] . Rajat Sen, Karthikeyan Shanmugam, Sanjay Shakkottai JMLR: Workshop and Conference Proceedings . 2018,第2009期

机译：随机专家的背景强盗
4. Nonparametric Stochastic Contextual Bandits [C] . Melody Y. Guan, Heinrich Jiang AAAI Conference on Artificial Intelligence;Innovative Applications of Artificial Intelligence Conference;Symposium on Educational Advances in Artificial Intelligence . 2018

机译：非参数随机上下文匪徒
5. Using Contextual Bandits to Improve Traffic Performance in Edge Network [D] . Al Zadjali, Aziza Najeeb. 2021

机译：使用上下文匪徒改进边缘网络中的流量性能
6. Action Centered Contextual Bandits [O] . Kristjan Greenewald, Ambuj Tewari, Predrag Klasnja, -1

机译：行动为中心的情境强盗
7. A stochastic multi-armed bandit approach to nonparametric H∞-norm estimation [O] . Matias I. Muller, Patricio E. Valenzuela, Alexandre Proutiere, 2017

机译：非参数H ∞ -norm估计的随机多武装匪

Nonparametric Stochastic Contextual Bandits

摘要

著录项

相似文献

相关主题

期刊订阅