Creating discrete observation space for Reinforcement Learning

6 visualizzazioni (ultimi 30 giorni)
I am trying to solve a reinforcement learning problem, where I need to take two arrays of 1xn dimension as observation and each element can have discrete values [1 1000].
How can I implement my requirement using the rlFiniteSetSpec from the Reinforcement Learning toolbox?

Risposte (1)

Aditya
Aditya il 21 Feb 2024
In reinforcement learning with MATLAB's Reinforcement Learning Toolbox, the `rlFiniteSetSpec` object is used to define a finite set of discrete observations or actions. However, this object is designed to handle a single array of discrete values, not multiple arrays.
If you have two arrays of 1xn dimension, and each element can take on any discrete value in the range [1, 1000], you might run into scalability issues since the observation space becomes extremely large. The total number of possible observations would be \(1000^n \times 1000^n\), which is impractical to handle for any non-trivial value of `n`.
To use `rlFiniteSetSpec` for your problem, you would need to define each unique combination of values in your two arrays as a separate observation. This is not feasible due to the combinatorial explosion of possibilities.
Instead, you should consider the following approaches:
1. Binning/Discretization: Reduce the resolution of your observation space by grouping ranges of values into bins. For example, instead of having each element range from 1 to 1000, you might define 10 bins representing ranges of values. This would drastically reduce the size of your observation space, but at the cost of losing some granularity.
2. Feature Engineering: Transform your raw observations into a set of features that meaningfully represent the state of your environment. This could involve extracting statistics or other properties from your arrays that capture the essential information for decision-making.
3. Use a Continuous Observation Space: If discretizing your observation space is not suitable, you might want to consider using a continuous observation space instead, which is represented by `rlNumericSpec` in MATLAB. Reinforcement learning algorithms that can handle continuous spaces, such as DDPG (Deep Deterministic Policy Gradient) or PPO (Proximal Policy Optimization), may be used in this case.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by