•  
  •  
 

Keywords

microgrid; safe reinforcement learning; energy management; asynchronous advantage actor-criticalgorithm

Abstract

Energy management of microgrids faces the dual challenges of poor adaptability to dynamic environments and insufficient safety in the training process. Traditional model-based energy optimization methods rely heavily on the accurate parameters of microgrids, making it difficult to cope with the dynamic changes of microgrids. A safe reinforcement learning method based on the constrained Markov game is proposed. First, a multi-agent safety boundary constraint including wind turbines, energy storage, and adjustable loads is constructed to limit policy exploration within the preset operation domain; second, an asynchronous safety verification thread is designed to correct the gradient update direction of the policy network in real time; finally, a simulation analysis of the proposed method is conducted using an instance. The research results show that under the premise of ensuring system safety, the proposed method increases the daily profit by 120 yuan compared with other methods, obtains the highest reward value, reduces the wind curtailment volume, and improves the energy storage utilization rate. By decoupling the spatiotemporal correlation between safety constraints and policy optimization, this method provides a scalable safe reinforcement learning paradigm for distributed energy systems.

DOI

10.19781/j.issn.1673-9140.2026.02.023

First Page

259

Last Page

270

Share

COinS