Main Content

generatePolicyFunction

Generate MATLAB function that evaluates policy of an agent or policy object

Description

This function generates a MATLAB® policy evaluation function which you can use to:

This function also creates a data file which stores policy information. The evaluation function loads this data file to properly initialize itself the first time it is called.

For more information on policies and value functions, see Create Policies and Value Functions.

generatePolicyFunction(agent) creates a function that evaluates the learned policy of the specified agent using the default function name, policy name, and data file name.

example

generatePolicyFunction(policy) creates a function that evaluates the learned policy of the specified policy object using the default function name, policy name, and data file name.

generatePolicyFunction(___,Name=Value) specifies the function name, policy name, and data file name using one or more name-value pair arguments.

example

Examples

collapse all

This example shows how to create a policy evaluation function for a PG Agent.

First, create and train a reinforcement learning agent. For this example, load the PG agent trained in Train PG Agent to Balance Discrete Cart-Pole System.

load("MATLABCartpolePG.mat","agent")

Then, create a policy evaluation function for this agent using default names.

generatePolicyFunction(agent);

This command creates the evaluatePolicy.m file, which contains the policy function, and the agentData.mat file, which contains the trained deep neural network actor.

View the generated function.

type evaluatePolicy.m
function action1 = evaluatePolicy(observation1)
%#codegen

% Reinforcement Learning Toolbox
% Generated on: 20-Jul-2024 15:15:36

persistent policy;
if isempty(policy)
	policy = coder.loadRLPolicy("agentData.mat");
end
% evaluate the policy
action1 = getAction(policy,observation1);

Evaluate the policy for a random observation.

evaluatePolicy(rand(agent.ObservationInfo.Dimension))
ans = 
10

You can now generate code for this policy function using MATLAB® Coder™. For more information, see Deploy Trained Reinforcement Learning Policies.

You can create and train a policy object in a custom training loop or extract a trained policy object from a trained agent.

For this example, load the PG agent trained in Train PG Agent to Balance Discrete Cart-Pole System, and extract its deterministic (and greedy) policy using getGreedyPolicy. Alternatively, you can extract a stochastic policy using getExplorationPolicy, which would be useful for exploration.

load("MATLABCartpolePG.mat","agent")
policy = getGreedyPolicy(agent)
policy = 
  rlStochasticActorPolicy with properties:

                     Actor: [1x1 rl.function.rlDiscreteCategoricalActor]
    UseMaxLikelihoodAction: 1
             Normalization: "none"
           ObservationInfo: [1x1 rl.util.rlNumericSpec]
                ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
                SampleTime: 1

Then, create a policy evaluation function for this policy using default names.

generatePolicyFunction(policy);

This command creates the evaluatePolicy.m file, which contains the policy function, and the agentData.mat file, which contains the trained deep neural network actor.

View the generated function.

type evaluatePolicy.m
function action1 = evaluatePolicy(observation1)
%#codegen

% Reinforcement Learning Toolbox
% Generated on: 20-Jul-2024 15:15:40

persistent policy;
if isempty(policy)
	policy = coder.loadRLPolicy("agentData.mat");
end
% evaluate the policy
action1 = getAction(policy,observation1);

Evaluate the policy for a random observation.

evaluatePolicy(rand(policy.ObservationInfo.Dimension))
ans = 
10

You can now generate code for this policy function using MATLAB® Coder™. For more information, see Deploy Trained Reinforcement Learning Policies.

This example shows how to create a policy evaluation function for a Q-Learning Agent.

For this example, load the Q-learning agent trained in Train Reinforcement Learning Agent in Basic Grid World.

load("basicGWQAgent.mat","qAgent")

Create a policy evaluation function for this agent and specify the name of the agent data file.

generatePolicyFunction(qAgent,"MATFileName","policyFile.mat")

This command creates the evaluatePolicy.m file, which contains the policy function, and the policyFile.mat file, which contains the trained Q table value function.

View the generated function.

type evaluatePolicy.m
function action1 = evaluatePolicy(observation1)
%#codegen

% Reinforcement Learning Toolbox
% Generated on: 20-Jul-2024 15:15:39

persistent policy;
if isempty(policy)
	policy = coder.loadRLPolicy("policyFile.mat");
end
% evaluate the policy
action1 = getAction(policy,observation1);

Evaluate the policy for a random observation.

evaluatePolicy(randi(25))
ans = 
4

You can now generate code for this policy function using MATLAB® Coder™. For more information, see Deploy Trained Reinforcement Learning Policies.

Input Arguments

collapse all

Trained reinforcement learning agent, specified as one of the following agent objects. To train your agent, use the train function.

For agents with a stochastic actor (PG, PPO, SAC, TRPO, AC), the action returned by the generated policy function depends on the value of the UseExplorationPolicy property of the agent. By default, UseExplorationPolicy is false and the generated action is deterministic. If UseExplorationPolicy is true, the generated action is stochastic.

Reinforcement learning policy, specified as one of the following objects:

Note

rlAdditiveNoisePolicy and rlEpsilonGreedyPolicy policy objects are not supported.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: FunctionName="computeAction"

Name of the generated function, specified as a string or character vector.

Name of the policy object within the generated function, specified as a string or character vector.

Name of generated data file, specified as a string or character vector. If a file with the specified name already exists in the current MATLAB folder, then an appropriate digit is added to the name so that no existing file is overwritten.

The generated data file contains four structures that store data needed to fully characterize the policy. The evaluation function loads this data file to properly initialize itself the first time it is called.

Version History

Introduced in R2019a

expand all