An automated signalized junction controller that learns strategies by temporal difference reinforcement learning

Simon Box; Ben Waterson

首页> 外文期刊>Engineering Applications of Artificial Intelligence >An automated signalized junction controller that learns strategies by temporal difference reinforcement learning

【24h】

An automated signalized junction controller that learns strategies by temporal difference reinforcement learning

机译：通过时差强化学习来学习策略的自动化信号连接控制器

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper shows how temporal difference learning can be used to build a signalized junction controller that will learn its own strategies through experience. Simulation tests detailed here show that the learned strategies can have high performance. This work builds upon previous work where a neural network based junction controller that can learn strategies from a human expert was developed (Box and Waterson, 2012). In the simulations presented, vehicles are assumed to be broadcasting their position over WiFi giving the junction controller rich information. The vehicle's position data are pre-processed to describe a simplified stare. The state-space is classified into regions associated with junction control decisions using a neural network. This classification is the strategy and is parametrized by the weights of the neural network. The weights can be learned either through supervised learning with a human trainer or reinforcement learning by temporal difference (TD). Tests on a model of an isolated T junction show an average delay of 14.12 sand 14.36 s respectively for the human trained and TD trained networks. Tests on a model of a pair of closely spaced junctions show 17.44 s and 20.82 s respectively. Both methods of training produced strategies that were approximately equivalent in their equitable treatment of vehicles, defined here as the variance over the journey time distributions.

机译：本文展示了如何使用时差学习来构建信号化结点控制器，该结点控制器将通过经验学习其自身的策略。此处详细介绍的仿真测试表明，所学习的策略可以具有较高的性能。这项工作是在以前的工作基础上开发的，该工作开发了基于神经网络的结点控制器，可以向人类专家学习策略（Box和Waterson，2012年）。在给出的仿真中，假定车辆正在通过WiFi广播其位置，从而为路口控制器提供了丰富的信息。车辆的位置数据经过预处理以描述简化的凝视。使用神经网络将状态空间分类为与路口控制决策相关的区域。这种分类是一种策略，并由神经网络的权重进行参数化。权重既可以通过人类培训师的监督学习来学习，也可以通过时差（TD）进行强化学习来学习。对隔离的T型结的模型进行的测试表明，对于人类训练和TD训练的网络，平均延迟分别为14.12沙14.36 s。一对紧密间隔的结点模型的测试分别显示17.44 s和20.82 s。两种训练方法所产生的策略在对车辆的公平对待方面均大致相同，此处定义为行驶时间分布的方差。

著录项

来源
《Engineering Applications of Artificial Intelligence》 |2013年第1期|652-659|共8页
作者
Simon Box; Ben Waterson;
展开▼
作者单位

Transportation Research Croup, Faculty of Engineering and the Environment, University of Southampton, SO17 1BJ, UK;

Transportation Research Croup, Faculty of Engineering and the Environment, University of Southampton, SO17 1BJ, UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
neural network; reinforcement learning; temporal difference; traffic; control; junction;

机译：神经网络;强化学习;时间差异;交通;控制;交界处;

相似文献

外文文献
中文文献
专利

1. An automated signalized junction controller that learns strategies from a human expert [J] . Simon Box, Ben Waterson Engineering Applications of Artificial Intelligence . 2012,第1期

机译：自动化的信号连接控制器，可向人类专家学习策略
2. Development of an Efficient Driving Strategy for Connected and Automated Vehicles at Signalized Intersections: A Reinforcement Learning Approach [J] . IEEE Transactions on Intelligent Transportation Systems . 2020,第1期

机译：信号交叉口的互联和自动车辆高效驾驶策略的开发：一种强化学习方法
3. Temporal Difference based Tuning of Fuzzy Logic Controller through Reinforcement Learning to Control an Inverted Pendulum [J] . Raj kumar, M. J. Nigam, Sudeep Sharma, International Journal of Intelligent Systems and Applications . 2012,第9期

机译：通过强化学习控制倒立摆的基于时差的模糊逻辑控制器整定
4. A Temporal Difference GNG-Based Algorithm That Can Learn to Control in Reinforcement Learning Environments [C] . Vieira Davi C.L., Adeodato Paulo J.L., Junior Paulo M.Goncalves International Conference on Machine Learning and Applications . 2013

机译：基于时差GNG的增强学习环境中可以学习控制的算法
5. Training Physics-Based Controllers for Articulated Characters with Deep Reinforcement Learning [D] . Biswas, Avishek. 2021

机译：培养基于物理的控制器，用于铰接性的人物，深增强学习
6. PNAS Plus: Contrasting temporal difference and opportunity cost reinforcement learning in an empirical money-emergence paradigm [O] . Germain Lefebvre, Aurélien Nioche, Sacha Bourgeois-Gironde, 2018

机译：PNAS Plus：在经验性货币涌现范例中对比时差和机会成本强化学习
7. An automated signalized junction controller that learns strategies by temporal difference reinforcement learning [O] . Box, S., Waterson, B. 2013

机译：自动信号交叉控制器，通过时间差异强化学习来学习策略

An automated signalized junction controller that learns strategies by temporal difference reinforcement learning

摘要

著录项

相似文献

相关主题

期刊订阅