This paper proposes a learning-based behavior generation approach for automated vehicles which is adapted sequentially. Instead of engineering behavioral policies for a variety of individual traffic situations by hand, our approach concentrates on a general problem description which is adjusted using a learning algorithm that successively derives safe actions as an outcome. Recent approaches apply Reinforcement Learning techniques for this problem using Markov Decision Processes (MDP). Our approach benefits from a trajectory planning module that uses an optimal control approach and generates realistic trajectories. Further, the trajectory planning module is exploited for the exploration in solving the adaption of the action selection problem. The task of action selection for merging into a roundabout as an exemplary traffic situation is examined. The contributions of this paper are the usage of an underlying optimization-based trajectory generation module and the evaluation of convergence of the adapted behavior, also for real-world data.
展开▼