Using the Levenberg–Marquardt optimizer in deep learning
xingxingcui
il 5 Set 2025
Since R2024b, a Levenberg–Marquardt solver (TrainingOptionsLM) was introduced. The built‑in function trainnet now accepts training options via the trainingOptions function (https://www.mathworks.com/help/deeplearning/ref/trainingoptions.html#bu59f0q-2) and supports the LM algorithm. I have been curious how to use it in deep learning, and the official documentation has not provided a concrete usage example so far. Below I give a simple example to illustrate how to use this LM algorithm to optimize a small number of learnable parameters.
For example, consider the nonlinear function:
y_hat = @(a,t) a(1)*(t/100) + a(2)*(t/100).^2 + a(3)*(t/100).^3 + a(4)*(t/100).^4;
It represents a curve. Given 100 matching points (t → y_hat), we want to use least squares to estimate the four parameters a1–a4.
t = (1:100)';
y_hat = @(a,t)a(1)*(t/100) + a(2)*(t/100).^2 + a(3)*(t/100).^3 + a(4)*(t/100).^4;
x_true = [ 20 ; 10 ; 1 ; 50 ];
y_true = y_hat(x_true,t);
plot(t,y_true,'o-')
- Using the traditional lsqcurvefit-wrapped "Levenberg–Marquardt" algorithm:
x_guess = [ 5 ; 2 ; 0.2 ; -10 ];
options = optimoptions("lsqcurvefit",Algorithm="levenberg-marquardt",MaxFunctionEvaluations=800);
[x,resnorm,residual,exitflag] = lsqcurvefit(y_hat,x_guess,t,y_true,-50*ones(4,1),60*ones(4,1),options);
x,resnorm,exitflag
- Using the deep-learning-wrapped "Levenberg–Marquardt" algorithm:
options = trainingOptions("lm", ...
InitialDampingFactor=0.002, ...
MaxDampingFactor=1e9, ...
DampingIncreaseFactor=12, ...
DampingDecreaseFactor=0.2,...
GradientTolerance=1e-6, ...
StepTolerance=1e-6,...
Plots="training-progress");
numFeatures = 1;
layers = [featureInputLayer(numFeatures,'Name','input')
fitCurveLayer(Name='fitCurve')];
net = dlnetwork(layers);
XData = dlarray(t);
YData = dlarray(y_true);
netTrained = trainnet(XData,YData,net,"mse",options);
netTrained.Layers(2)
classdef fitCurveLayer < nnet.layer.Layer ...
& nnet.layer.Acceleratable
% Example custom SReLU layer.
properties (Learnable)
% Layer learnable parameters
a1
a2
a3
a4
end
methods
function layer = fitCurveLayer(args)
arguments
args.Name = "lm_fit";
end
% Set layer name.
layer.Name = args.Name;
% Set layer description.
layer.Description = "fit curve layer";
end
function layer = initialize(layer,~)
% layer = initialize(layer,layout) initializes the layer
% learnable parameters using the specified input layout.
if isempty(layer.a1)
layer.a1 = rand();
end
if isempty(layer.a2)
layer.a2 = rand();
end
if isempty(layer.a3)
layer.a3 = rand();
end
if isempty(layer.a4)
layer.a4 = rand();
end
end
function Y = predict(layer, X)
% Y = predict(layer, X) forwards the input data X through the
% layer and outputs the result Y.
% Y = layer.a1.*exp(-X./layer.a2) + layer.a3.*X.*exp(-X./layer.a4);
Y = layer.a1*(X/100) + layer.a2*(X/100).^2 + layer.a3*(X/100).^3 + layer.a4*(X/100).^4;
end
end
end
The network is very simple — only the fitCurveLayer defines the learnable parameters a1–a4. I observed that the output values are very close to those from lsqcurvefit.
