A bounded actor-critic reinforcement learning algorithm applied to airline revenue management

Lawhead Ryan J.; Gosavi Abhijit

首页> 外文期刊>Engineering Applications of Artificial Intelligence >A bounded actor-critic reinforcement learning algorithm applied to airline revenue management

【24h】

A bounded actor-critic reinforcement learning algorithm applied to airline revenue management

机译：一种有约束力的行为者与批判强化学习算法，应用于航空公司收益管理

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement Learning (RL) is an artificial intelligence technique used to solve Markov and semi-Markov decision processes. Actor critics form a major class of RL algorithms that suffer from a critical deficiency, which is that the values of the so-called actor in these algorithms can become very large causing computer overflow. In practice, hence, one has to artificially constrain these values, via a projection, and at times further use temperature-reduction tuning parameters in the popular Boltzmann action-selection schemes to make the algorithm deliver acceptable results. This artificial bounding and temperature reduction, however, do not allow for full exploration of the state space, which often leads to sub-optimal solutions on large-scale problems. We propose a new actor critic algorithm in which (i) the actor's values remain bounded without any projection and (ii) no temperature-reduction tuning parameter is needed. The algorithm also represents a significant improvement over a recent version in the literature, where although the values remain bounded they usually become very large in magnitude, necessitating the use of a temperature-reduction parameter. Our new algorithm is tested on an important problem in an area of management science known as airline revenue management, where the state-space is very large. The algorithm delivers encouraging computational behavior, outperforming a well-known industrial heuristic called EMSR-b on industrial data.

机译：强化学习（RL）是一种人工智能技术，用于解决马尔可夫和半马尔可夫决策过程。 Actor评论家形成了RL算法的主要类别，这些算法有一个严重的缺陷，那就是这些算法中所谓的actor的值可能会变得非常大，从而导致计算机溢出。因此，实际上，人们必须通过投影来人为地限制这些值，并有时在流行的玻耳兹曼动作选择方案中进一步使用降低温度的调整参数，以使算法提供可接受的结果。但是，这种人为限制和温度降低无法充分探索状态空间，这通常会导致大规模问题的次优解决方案。我们提出了一种新的演员批评算法，其中（i）演员的值保持有界而没有任何投影，并且（ii）不需要降低温度的调节参数。该算法还代表了相对于文献中的最新版本的显着改进，在文献中，尽管这些值仍然有限，但它们通常在大小上变得非常大，因此有必要使用降温参数。我们的新算法已在状态空间非常大的管理科学领域（称为航空公司收益管理）中的一个重要问题上进行了测试。该算法可提供令人鼓舞的计算性能，优于在工业数据上称为EMSR-b的著名工业启发式算法。

著录项

来源
《Engineering Applications of Artificial Intelligence》 |2019年第6期|252-262|共11页
作者
Lawhead Ryan J.; Gosavi Abhijit;
展开▼
作者单位

Missouri Univ Sci & Technol, Dept Engn Management & Syst Engn, Rolla, MO 65409 USA;

Missouri Univ Sci & Technol, Dept Engn Management & Syst Engn, Rolla, MO 65409 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Reinforcement learning; Actor critics; Airline revenue management;

机译：强化学习;演员批评家;航空公司收入管理;

相似文献

外文文献
中文文献
专利

1. A bounded actor-critic reinforcement learning algorithm applied to airline revenue management [J] . Lawhead Ryan J., Gosavi Abhijit Engineering Applications of Artificial Intelligence . 2019,第Juna期

机译：应用于航空公司收入管理的有界演员批评批评学习算法
2. Reinforcement learning applied to airline revenue management [J] . Nicolas Bondoux, Ann Quan Nguyen, Thomas Fiig, Journal of Revenue and Pricing Management . 2020,第5期

机译：加强学习适用于航空公司收入管理
3. LEARNING TO CONTROL THE THREE-LINK MUSCULOSKELETAL ARM USING ACTOR-CRITIC REINFORCEMENT LEARNING ALGORITHM DURING REACHING MOVEMENT [J] . Ehsan Tahami, Amir Homayoun Jafari, Ali Fallah Biomedical Engineering: Applications, Basis and Communications . 2014,第5期

机译：在运动过程中使用基于行为准则的强化学习算法来控制三链肌骨骼肌的学习
4. Towards the Next Generation Airline Revenue Management: A Deep Reinforcement Learning Approach to Seat Inventory Control and Overbooking [C] . Syed Shihab, Caleb Logemann, Deepak-George Thomas, Annual AGIFORS symposium . 2019

机译：走向下一代航空公司收入管理：座椅库存控制和超预订的深度加强学习方法
5. A Bounded Actor-Critic Algorithm for Reinforcement Learning [D] . Lawhead, Ryan Jacob. 2017

机译：一种有限于钢筋学习的批评算法
6. Believer-Skeptic Meets Actor-Critic: Rethinking the Role of Basal Ganglia Pathways during Decision-Making and Reinforcement Learning [O] . Kyle Dunovan, Timothy Verstynen 2016

机译：怀疑论者遇到演员批评者：重新思考基础神经节通路在决策和强化学习中的作用
7. A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning [O] . Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, 2020

机译：用于分布式强化学习的多功能脱机演员 - 批评算法

A bounded actor-critic reinforcement learning algorithm applied to airline revenue management

摘要

著录项

相似文献

相关主题

期刊订阅