对抗制
强化学习
计算机科学
一般化
人工智能
钢筋
机器学习
工程类
数学
结构工程
数学分析
标识
DOI:10.1145/3651671.3651762
摘要
Improving the generalization ability in offline reinforcement learning (RL) has received much attention in recent years. Existing adversarial RL approaches use adversarial training for the policy improvement, thus enhancing the generalization ability of RL agents. However, adversarial training severely hinders the performance improvement of agents in offline RL settings. This is because adversarial training is a pessimistic learning paradigm, where the adversarial attack patterns aim to improve the agents' generalization ability in worst-case scenarios. Such a learning paradigm struggles to improve the policy performance in the unstable training process of offline RL, thereby making it challenging to enhance generalization ability. To tackle this problem, we propose a novel offline adversarial RL approach, namely Soft Adversarial Offline Reinforcement Learning (SAORL), which learns soft adversarial examples by reducing the attack strength of adversarial examples in offline RL. Specifically, SAORL proposes the Wasserstein-based constraint on traditional adversarial examples, thus formulating a worse-case optimization problem to learn the soft adversarial examples. We conduct extensive experiments on D4RL to evaluate our approach, which demonstrates SAORL can improve agents' performance and zero-shot generalization ability.
科研通智能强力驱动
Strongly Powered by AbleSci AI