首页> 外文会议>International conference on soft computing >MarcoPolo: A Reinforcement Learning System considering tradeoff exploration and exploitation under Marcovian Environments

【24h】

MarcoPolo: A Reinforcement Learning System considering tradeoff exploration and exploitation under Marcovian Environments

机译：MarCopolo：考虑Marcovian环境下的权衡探索和剥削的加强学习系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to rewards. We consider that ideal reinforcement learning systems are to get some rewards even at an early learning systems are to get some rewards even at an early learning phase and to get more rewards as exploration of the environment propceeds. In this paper, we propose a unified learning system: MarcoPolo that takes account of both getting rewards by Profit Sharing or Policy Iteration and exploring the environment by k-Certainty Exploration Method. MarcoPolo can realize any tradeoff between exploitation and exploration through whole learning processes. By applying MarocPolo to numerical examples, its effectiveness is shown.

机译：强化学习是一种机器学习。它旨在将代理调整到给定的环境与线索奖励。我们认为，即使在早期的学习系统中，即使在早期的学习阶段，也可以获得一些奖励的理想加固学习系统，并获得更多奖励作为环境的探索。在本文中，我们提出了一个统一的学习系统：Marcopolo考虑到盈利共享或政策迭代的奖励，并通过K-Cerlainty探索方法探索环境。马隆波洛可以通过整个学习过程实现剥削与勘探之间的任何权衡。通过将Marocpolo施加到数值例子，显示其有效性。

著录项

来源
《International conference on soft computing》|1996年||共4页
会议地点
作者
Kazuteru Miyazaki; Masayuki Yamamura; Shigenobu Kobayashi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化系统理论;
关键词
reinforcement learning; exploitation and exploration tradeoff; markov decision processes; profit sharing; policy iteration;

机译：加强学习;利用和探索权衡;马尔可夫决策过程;利润分享;政策迭代;

相似文献

外文文献
中文文献
专利

1. Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning [J] . Damien Ernst, Francis Maes, Michael Castronovo, JMLR: Workshop and Conference Proceedings . 2012,第2012期

机译：单轨强化学习的学习探索/开发策略
2. Exploration-exploitation tradeoffs and information-knowledge gaps in self-regulated learning: Implications for learner-controlled training and development [J] . Hardy Jay H. III, Day Eric Anthony, Arthur Winfred Jr. Human Resource Management Review . 2019,第2期

机译：自我调节学习中的探索与开发权衡和信息知识差距：对学习者控制的培训和发展的启示
3. Organizational learning with forgetting: Reconsidering the exploration-exploitation tradeoff [J] . Miller Kent D., Martignoni Dirk Strategic Organization . 2016,第1期

机译：忘记学习的组织学习：重新考虑勘探与开发的权衡
4. MarcoPolo: A Reinforcement Learning System considering tradeoff exploration and exploitation under Marcovian Environments [C] . Kazuteru Miyazaki, Masayuki Yamamura, Shigenobu Kobayashi International conference on soft computing . 1996

机译：MarcoPolo：在Marcovian环境下考虑权衡探索和开发的强化学习系统
5. Exploitation and exploration as collective learning strategies in a complex environment: A case study of a Chinese manufacturing enterprise . [D] . Gottemoeller, Mary E. 2010

机译：开发和探索作为复杂环境下的集体学习策略：以一家中国制造企业为例。
6. Dopamine Locus of Control and the Exploration-Exploitation Tradeoff [O] . Andrew S Kayser, Jennifer M Mitchell, Dawn Weinstein, 2015

机译：多巴胺控制源和勘探开发权衡
7. Exploration and Exploitation Tradeoff using Fuzzy Reinforcement Learning [O] . Seyed Mohammad, Hossein Nabavi, Somayeh Hajforoosh 2013

机译：基于模糊强化学习的勘探与开发权衡

MarcoPolo: A Reinforcement Learning System considering tradeoff exploration and exploitation under Marcovian Environments

摘要

著录项

相似文献

相关主题

期刊订阅