【文献分享】MF^2: Model-free reinforcement learning for modeling-free building HVAC control with data-driven environment construction in a residential building-Peng XU Research Group

【文献分享】MF^2: Model-free reinforcement learning for modeling-free building HVAC control with data-driven environment construction in a residential building

2025-09-17 学术动态龚春霖

, Borong Lin a b

文章摘要：Abstract

Reinforcement Learning (RL) has advanced energy-efficient control of building Heating, Ventilation and Air Conditioning (HVAC) systems. Constructing a suitable RL environment for buildings is a crucial challenge. Compared to widely-used simulation-based environments, data-driven approaches offer higher training efficiency but face convergence difficulties due to influential factors, limiting their current application.

To explore data-driven construction of RL environments for building HVAC systems, this study proposes two strategies for controlling room temperature setpoints in a residential building. XGBoost and Long Short-Term Memory Network (LSTM) are trained for energy consumption and room temperature prediction. One strategy predicts parameters for on-off states, while the other for power-on states. The XGBoost models are integrated into an OpenAI Gym environment. The first strategy achieves 0.8634 R2 and 0.2423 Root Mean Squared Error (RMSE) for energy consumption prediction. The R2 of room air temperature models are approximately 0.99 and the RMSE are lower than 0.31. The second strategy achieves 0.9181 R2 and 0.1042 RMSE for energy consumption prediction and similar performance for room temperature prediction. Deep Q-learning (DQN) and Deep Deterministic Policy Gradient (DDPG) algorithms are separately trained using these environments. Results show that the first strategy fails to induce the correct training of RL models, while the second strategy successfully induces a useable DDPG model for controlling building HVAC systems but fails to induce a useable DQN model. We analyze the reasons behind these observations. Compared to the original room temperature setpoint method, the DDPG-based HVAC control logic achieves a 10.06% energy-saving effect while ensuring comfort.

/MODEL_FREE_THU.pdf