强化学习
宏
计算机科学
整改
人工智能
内在价值(动物伦理)
分解
内在动机
钢筋
价值(数学)
机器学习
工程类
心理学
电气工程
哲学
环境伦理学
电压
生物
社会心理学
程序设计语言
结构工程
生态学
作者
Zhihao Liu,Zhiwei Xu,Guoliang Fan
标识
DOI:10.1109/icassp49357.2023.10095374
摘要
Hierarchical reinforcement learning (HRL) is a promising approach to solving long-term decision problems and complex tasks, as high-level policy can guide the training procedure of low-level policy with macro actions and intrinsic rewards. However, the amount that macro actions influence decision-making, which affects how much internal rewards should be given to low-level policy, is disregarded by current HRL algorithms. It may be reasonable to provide low-level policy with less intrinsic rewards if macro actions are less important in decision-making. In this paper, we propose a value decomposition based hierarchical multi-agent reinforcement learning method with intrinsic reward rectification, which can determine the effectiveness of macro actions and correct the intrinsic rewards. We show that our proposed method significantly outperforms the state-of-the-art value decomposition approaches on the StarCraft Multi-Agent Challenge platform.
科研通智能强力驱动
Strongly Powered by AbleSci AI