...
首页> 外文期刊>Journal of robotics and mechatronics >TD learning with neural networks
【24h】

TD learning with neural networks

机译:用神经网络进行TD学习

获取原文
获取原文并翻译 | 示例
           

摘要

Temporal difference (TD) learning (TD learning), proposed by Sutton in the late 1980s, is very interesting prediction using obtained predictions for future prediction. Applying this learning to neural networks helps improve prediction performanceusing neural networks, after certain problems are solved.Major problems are as follows:1) Prediction P{sub}t at time t is assumed to be scalar in Sutton's original paper, raising the problem of "what is the rule for updating weight vector of the neural network if the neural network has multiple outputs?"2) How do we derive individual components of gradient vectorΔ{sub}wP{sub}t for weight vector w?This paper proposes how to handle these problems when TD learning is used in a neural network, focusing on the TD(0) algorithm, often used in TD learning. It proposes the rule for updating the neural network weight vector for a two-out neural networkunder problem 1) above, and explains the rule's validity. It then proposes computing every components of Δ{sub}wP{sub}t.
机译:萨顿(Sutton)在1980年代后期提出的时差(TD)学习(TD learning)是非常有趣的预测,使用获得的预测进行未来的预测。在解决某些问题后,将这种学习应用于神经网络有助于使用神经网络提高预测性能。主要问题如下:1)在萨顿的原始论文中,假设在时间t的预测P {sub} t是标量的,这引起了问题。 “如果神经网络有多个输出,更新神经网络的权向量的规则是什么?” 2)我们如何得出权向量w的梯度向量Δ{sub} wP {sub} t的各个分量?当在神经网络中使用TD学习时,为了解决这些问题,重点是TD学习中经常使用的TD(0)算法。提出了上面问题1)下的二进位神经网络的神经网络权向量更新规则,并解释了该规则的有效性。然后提出计算Δ{sub} wP {sub} t的每个分量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号