辍学(神经网络)
强化学习
计算机科学
贝尔曼方程
功能(生物学)
趋同(经济学)
跟踪(教育)
国家(计算机科学)
最优控制
过程(计算)
人工神经网络
控制(管理)
控制理论(社会学)
数学优化
数学
人工智能
机器学习
算法
经济
心理学
教育学
进化生物学
生物
经济增长
操作系统
作者
Xueying Jiang,Min Huang,Huiyuan Shi,Min Huang,Yanfeng Zhang
标识
DOI:10.1016/j.isatra.2023.11.011
摘要
In this paper, a new off-policy two-dimensional (2D) reinforcement learning approach is proposed to deal with the optimal tracking control (OTC) issue of batch processes with network-induced dropout and disturbances. A dropout 2D augmented Smith predictor is first devised to estimate the present extended state utilizing past data of time and batch orientations. The dropout 2D value function and Q-function are further defined, and their relation is analyzed to meet the optimal performance. On this basis, the dropout 2D Bellman equation is derived according to the principle of the Q-function. For the sake of addressing the dropout 2D OTC problem of batch processes, two algorithms, i.e., the off-line 2D policy iteration algorithm and the off-policy 2D Q-learning algorithm, are presented. The latter method is developed by applying only the input and the estimated state, not the underlying information of the system. Meanwhile, the analysis with regard to the unbiasedness of solutions and convergence is separately given. The effectiveness of the provided methodologies is eventually validated through the application of a simulated case during the filling process.
科研通智能强力驱动
Strongly Powered by AbleSci AI