强化学习
模型预测控制
控制理论(社会学)
计算机科学
职位(财务)
碰撞
车辆动力学
路径(计算)
数学优化
工程类
控制(管理)
人工智能
数学
航空航天工程
经济
程序设计语言
计算机安全
财务
作者
Wenqi Cai,Arash Bahari Kordabad,Hossein Nejatbakhsh Esfahani,Anastasios M. Lekkas,Sébastien Gros
标识
DOI:10.1109/cdc45484.2021.9683750
摘要
In this work, we propose a Model Predictive Control (MPC)-based Reinforcement Learning (RL) method for Autonomous Surface Vehicles (ASVs). The objective is to find an optimal policy that minimizes the closed-loop performance of a simplified freight mission, including collision-free path following, autonomous docking, and a skillful transition between them. We use a parametrized MPC-scheme to approximate the optimal policy, which considers path-following/docking costs and states (position, velocity)/inputs (thruster force, angle) constraints. The Least Squares Temporal Difference (LSTD)-based Deterministic Policy Gradient (DPG) method is then applied to update the policy parameters. Our simulation results demonstrate that the proposed MPC-LSTD-based DPG method could improve the closed-loop performance during learning for the freight mission problem of ASV.
科研通智能强力驱动
Strongly Powered by AbleSci AI