首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models
【24h】

Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models

机译:重新思考具有潜在变量模型的端到端对话框代理中的强化学习动作空间

获取原文

摘要

Defining action spaces for conversational agents and optimizing their decision-making process with reinforcement learning is an enduring challenge. Common practice has been to use handcrafted dialog acts, or the output vocabulary, e.g. in neural encoder decoders, as the action spaces. Both have their own limitations. This paper proposes a novel latent action framework that treats the action spaces of an end-to-end dialog agent as latent variables and develops unsupervised methods in order to induce its own action space from the data. Comprehensive experiments are conducted examining both continuous and discrete action types and two different optimization methods based on stochastic variational inference. Results show that the proposed latent actions achieve superior empirical performance improvement over previous word-level policy gradient methods on both DealOrNoDeal and MultiWoz dialogs. Our detailed analysis also provides insights about various latent variable approaches for policy learning and can serve as a foundation for developing better latent actions in future research.
机译:定义对话式行动者的行动空间并通过强化学习来优化其决策过程是一项长期的挑战。常见的做法是使用手工制作的对话动作或输出词汇,例如在神经编码器的解码器中,作为动作空间。两者都有其自身的局限性。本文提出了一种新颖的潜在行动框架,该框架将端到端对话代理的行动空间视为潜在变量,并开发了无监督的方法,以便从数据中得出自己的行动空间。进行了综合实验,研究了连续和离散动作类型以及基于随机变分推断的两种不同的优化方法。结果表明,在DealOrNoDeal和MultiWoz对话框上,所提出的潜在动作都比以前的单词级策略梯度方法具有更好的经验性能改进。我们的详细分析还提供了有关各种潜在的可变变量策略学习的见解,并可以作为在将来的研究中开发更好的潜在行为的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号