首页> 外文会议>Conference on empirical methods in natural language processing >Playing 20 Question Game with Policy-Based Reinforcement Learning
【24h】

Playing 20 Question Game with Policy-Based Reinforcement Learning

机译:通过基于策略的强化学习玩20问题游戏

获取原文

摘要

The 20 Questions (Q20) game is a well known game which encourages deductive reasoning and creativity. In the game, the answerer first thinks of an object such as a famous person or a kind of animal. Then the questioner tries to guess the object by asking 20 questions. In a Q20 game system, the user is considered as the answerer while the system itself acts as the questioner which requires a good strategy of question selection to figure out the correct object and win the game. However, the optimal policy of question selection is hard to be derived due to the complexity and volatility of the game environment. In this paper, we propose a novel policy-based Reinforcement Learning (RL) method, which enables the questioner agent to learn the optimal policy of question selection through continuous interactions with users. To facilitate training, we also propose to use a reward network to estimate the more informative reward. Compared to previous methods, our RL method is robust to noisy answers and does not rely on the Knowledge Base of objects. Experimental results show that our RL method clearly outperforms an entropy-based engineering system and has competitive performance in a noisy-free simulation environment.
机译:20个问题(Q20)游戏是一个著名的游戏,它鼓励演绎推理和创造力。在游戏中,答题人首先想到的是诸如名人或动物之类的物体。然后,发问者尝试通过提出20个问题来猜测对象。在Q20游戏系统中,用户被视为回答者,而系统本身充当提问者,这就需要一个很好的问题选择策略来找出正确的对象并赢得游戏。但是,由于游戏环境的复杂性和易变性,很难得出最佳的问题选择策略。在本文中,我们提出了一种新颖的基于策略的强化学习(RL)方法,该方法使发问者代理能够通过与用户的持续交互来学习问题选择的最佳策略。为了促进培训,我们还建议使用奖励网络来估算信息量更大的奖励。与以前的方法相比,我们的RL方法对嘈杂的答案具有鲁棒性,并且不依赖于对象的知识库。实验结果表明,我们的RL方法明显优于基于熵的工程系统,并且在无噪声的仿真环境中具有竞争优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号