In this paper, Expected Success Probability (ESP) is defined and a reinforcement learning method Stable Profit Sharing with Expected Failure Probability (SPSwithEFP) is proposed. In SPSwith-EFP, Expected Failure Probability (EFP) is used in the roulette wheel selection method and ESP is used in the update equation of the weight of a rule. EFP can discard risky actions and ESP can make the distribution of learned results smaller. The effectiveness is shown with simulation experiments for a maze environment with pitfalls.
展开▼