【24h】

Nonparametric Stochastic Contextual Bandits

机译:非参数随机上下文匪徒

获取原文

摘要

We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions. We attain tight results for top-arm identification and a sublinear regret of O{top}~ (T~((1+D)/(2+D)), where D is the context dimension, for a modified UCB algorithm that is simple to implement. We then give global intrinsic dimension dependent and ambient dimension independent regret bounds. We also discuss recovering topological structures within the context space based on expected bandit performance and provide an extension to infinite-armed contextual bandits. Finally, we experimentally show the improvement of our algorithm over existing approaches for both simulated tasks and MNIST image classification.
机译:我们分析了K武装的强盗问题,其中每个臂的奖励是基于在轻度非参数假设下观察到的上下文的嘈杂实现。 我们对顶部武器识别和o {top}〜(t〜((1 + d)/(2 + d))的汇总遗憾,其中d是上下文算法,用于修改的UCB算法 易于实施。然后,我们提供全局内在维度依赖性和环境维度独立的遗憾范围。我们还根据预期的匪徒性能讨论恢复上下文空间内的拓扑结构,并为无限武装的上下围匪徒提供扩展。最后,我们实验表明了 通过模拟任务和Mnist图像分类的现有方法改进我们的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号