强化学习
人工智能
计算机科学
钢筋
控制(管理)
机器学习
学习分类器系统
帕累托原理
数学优化
工程类
数学
结构工程
作者
Man-Je Kim,Hyunsoo Park,Chang Wook Ahn
出处
期刊:Electronics
[MDPI AG]
日期:2022-03-28
卷期号:11 (7): 1069-1069
被引量:1
标识
DOI:10.3390/electronics11071069
摘要
Control intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, multi-objective deep reinforcement learning methods based on policy optimization are leading the optimization of control intelligence. However, multi-objective reinforcement learning has difficulties when finding various Pareto optimals of multi-objectives due to the greedy nature of reinforcement learning. We propose a method of policy assimilation to solve this problem. This method was applied to MO-V-MPO, one of preference-based multi-objective reinforcement learning, to increase diversity. The performance of this method has been verified through experiments in a continuous control environment.
科研通智能强力驱动
Strongly Powered by AbleSci AI