强化学习
马尔可夫决策过程
计算机科学
电力系统
最大化
可扩展性
数学优化
马尔可夫过程
人工智能
功率(物理)
统计
物理
数学
量子力学
数据库
作者
Ahmed Rabee Sayed,Xian Zhang,Guibin Wang,Jing Qiu,Cheng Wang
标识
DOI:10.1109/tpwrs.2023.3320172
摘要
Increasing interdependencies between power and gas systems and integrating large-scale intermittent renewable energy increase the complexity of energy management problems. This article proposes a model-free safe deep reinforcement learning (DRL) approach to find fast optimal energy flow (OEF), guaranteeing its feasibility in real-time operation with high computational efficiency. A constrained Markov decision process model is standardized for the optimization problem of OEF with a limited number of state and control actions and developing a robust integrated environment. Because state-of-the-art DRL algorithms lack safety guarantees, this article develops a soft-constraint enforcement method to adaptively encourage the control policy in the safety direction with non-conservative control actions. The overall procedure, namely the constrained soft actor-critic (C-SAC) algorithm, is off-policy, entropy maximization-based, sample-efficient, and scalable with low hyper-parameter sensitivity. The proposed C-SAC algorithm validates its superiority over the existing learning-based safety ones and OEF solution methods by finding fast OEF decisions with near-zero degrees of constraint violations. The proposed approach indicates its practicability for real-time energy system operation and extensions for other potential applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI