Feeding time series to RL environment

1 visualizzazione (ultimi 30 giorni)
Mostafa
Mostafa il 7 Set 2022
Modificato: Venu il 22 Nov 2023
Hello,
I'm attempting to create an RL environment for batteries charge/discharge management respecting photovoltaic system outputs (PV), on-site load profiles (LOAD), electricity prices (P), and the battery state of charge (SOC). In each state, the battery state of charge is the sole parameter that may be changed by the agent's actions; the other three parameters are deterministic time series that I'm trying to input into the environment. Lets say each episode has 24 states (corresponding to the number of hours in a day), and I need to provide corresponding PV(t), LOAD(t), and P(t), for each state, while SOC(t) is updated according to the agent action.
While training the agent, at the start of each state, how can I extract a running state number to use it as an index to extract its related PV(t), LOAD(t), and P(t) from data sets to be used for calculations? I'm trying to create my model in the MATLAB environment.
Any help would be much appreciated,
Best regards,
Mosi

Risposte (1)

Venu
Venu il 22 Nov 2023
Modificato: Venu il 22 Nov 2023
Hi @Mostafa,
I understand that you want to extract the running state number to use as an index for the related PV(t), LOAD(t), and P(t) from the datasets for your battery charge/discharge management RL setup. You can follow the steps mentioned below:
1. Reset Function: When resetting the environment to its initial state, you can use the current running state number (initially set to 1) as an index to extract the corresponding PV, LOAD, and P values from your datasets. This information forms the initial state for the agent to start its interaction with the environment.
2. Step Function: As the agent progresses through each step, you update the current state number. With each step, the environment extracts the next state's information from the datasets based on the updated state number. This extracted information includes the PV, LOAD, and P values corresponding to the current time step, along with the current state of charge (SOC) of the battery.
By following these steps, you ensure that the environment provides the agent with the necessary information for decision-making at each time step, allowing the agent to interact with the environment based on the extracted state information.
This example code might be useful for your understanding.
function initialState = reset(this) % Reset environment to initial state this.CurrentState =1;
this.SOC = 0.5; % Reset SOC to initial value
initialState = [this.PVData(this.CurrentState), this.LoadData(this.CurrentState), this.PriceData(this.CurrentState), this.SOC];
function [nextState, reward, isDone, loggedSignals] = step(this, action)
% Simulate one step in the environment
% Update SOC based on the agent's action
this.SOC = this.SOC + action;
this.SOC = max(this.MinSOC, min(this.MaxSOC, this.SOC)); % Clip SOC within limits
% Update current state
this.CurrentState = this.CurrentState + 1;
% Check if the episode is done
isDone = this.CurrentState > this.NumStates;
% Calculate reward (modify as per your reward function)
reward = calculateReward(this.PVData(this.CurrentState), this.LoadData(this.CurrentState), this.PriceData(this.CurrentState), this.SOC);
% Prepare next state
nextState = [this.PVData(this.CurrentState), this.LoadData(this.CurrentState), this.PriceData(this.CurrentState), this.SOC];
% You can include additional information in loggedSignals if needed
end
Please refer this documentation for more information:
Hope this helps!

Prodotti


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by