首页> 外文会议>Annual conference on Neural Information Processing Systems >Online Learning with Switching Costs and Other Adaptive Adversaries
【24h】

Online Learning with Switching Costs and Other Adaptive Adversaries

机译:以交换成本和其他适应对手在线学习

获取原文

摘要

We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize -in a nearly complete manner- the power of adaptive adversaries with bounded memories and switching costs. In particular,we show that with switching costs, the attainable rate with bandit feedback is Θ-tilde(T~(2/3)). Interestingly, this rate is significantly worse than the Θ(T~(1/2)) rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also show that a bounded memory adversary can force Θ-tilde(T~(2/3)) regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies.
机译:我们研究不同类型的专家建议预测的设置自适应(nonoblivious)对手的力量,同时支持完全的信息和土匪反馈下。我们使用的遗憾了新的概念,也被称为政策的遗憾,这更好地捕捉对手的适应能力,以玩家的行为衡量球员的表现。在损失被允许漂移的设置,我们描述-in一个几乎完整的方式载有界的记忆和转换成本适应对手的力量。尤其是,我们表明,转换成本,与匪反馈可达到的速度是Θ-波浪号(T〜(2/3))。有趣的是,这个速度比Θ(T〜(1/2))率可达到与切换在全信息的情况下成本显著恶化。通过专家土匪一种新颖的减少,我们还表明,有界内存对手可以强制Θ-波浪号(T〜(2/3))感到遗憾,即使在完全信息的情况下,证明了转换成本要比界内存更容易控制对手。我们的下界依赖于具有很强的依赖性产生损失过程一个新的随机对手的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号