首页> 外文会议>AAAI Conference on Artificial Intelligence >Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing
【24h】

Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing

机译:在数字营销的应用中,预测非视野决策问题的脱离政策政策评估

获取原文

摘要

In this paper we consider the problem of evaluating one digital marketing policy (or more generally, a policy for an MDP with unknown transition and reward functions) using data collected from the execution of a different policy. We call this problem off-policy policy evaluation. Existing methods for off-policy policy evaluation assume that the transition and reward functions of the MDP are stationary - an assumption that is typically false, particularly for digital marketing applications. This means that existing off-policy policy evaluation methods are reactive to nonstationarity, in that they slowly correct for changes after they occur. We argue that off-policy policy evaluation for nonstationary MDPs can be phrased as a time series prediction problem, which results in predictive methods that can anticipate changes before they happen. We therefore propose a synthesis of existing off-policy policy evaluation methods with existing time series prediction methods, which we show results in a drastic reduction of mean squared error when evaluating policies using real digital marketing data set.
机译:在本文中,我们考虑使用从执行不同策略的执行中收集的数据来评估一个数字营销政策(或更一般地,具有未知转换和奖励函数的MDP策略)的问题。我们称此问题违规政策评估。违规策略评估的现有方法假设MDP的转换和奖励功能是静止的 - 一种通常是假的假设,特别是对于数字营销应用程序。这意味着现有的脱助政策策略评估方法对非间手性具有反应性,因为它们在发生后慢慢纠正变化。我们认为,非间断MDP的违规政策评估可以作为时间序列预测问题被扣除,这导致预测方法可以预测在它们发生之前的变化。因此,我们提出了具有现有时间序列预测方法的现有违规政策评估方法的合成,我们显示在使用真实数字营销数据集评估策略时,我们展示了平均平方误差的急剧减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号