122 views (last 30 days)

Hello,

since there is no hyperparameter tuning function for neural network I wanted to try the bayesopt function. I tried to recreate the example here: https://de.mathworks.com/help/stats/bayesian-optimization-case-study.html. But this does not work. Is there a possibility to tune the number of hidden neurons? My code does not work...

[m,n] = size(Daten) ;

P = 0.7 ;

Training = Daten(1:round(P*m),:) ;

Testing = Daten(round(P*m)+1:end,:);

XTrain=Training(:,1:n-1);

YTrain=Training(:,n);

XTest=Testing(:,1:n-1);

YTest=Testing(:,n);

c = cvpartition(YTrain,'KFold',10);

hiddenLayerSize=optimizableVariable('hiddenLayerSize',[0,20]);

minfn = @(z)kfoldLoss(fitnet(XTrain,YTrain,'CVPartition',c,...

'hiddenLayerSize',z.hiddenLayerSize));

results = bayesopt(minfn,hiddenLayerSize,'IsObjectiveDeterministic',true,...

'AcquisitionFunctionName','expected-improvement-plus');

Don Mathis
on 17 Nov 2018

If you want a more complete workflow that also optimizes the learning rate, and tests the final model on your test set, you could try this:

% Make some data

Daten = rand(100, 3);

Daten(:,3) = Daten(:,1) + Daten(:,2) + .1*randn(100, 1); % Minimum asymptotic error is .1

[m,n] = size(Daten) ;

% Split into train and test

P = 0.7 ;

Training = Daten(1:round(P*m),:) ;

Testing = Daten(round(P*m)+1:end,:);

XTrain = Training(:,1:n-1);

YTrain = Training(:,n);

XTest = Testing(:,1:n-1);

YTest = Testing(:,n);

% Define a train/validation split to use inside the objective function

cv = cvpartition(numel(YTrain), 'Holdout', 1/3);

% Define hyperparameters to optimize

vars = [optimizableVariable('hiddenLayerSize', [1,20], 'Type', 'integer');

optimizableVariable('lr', [1e-3 1], 'Transform', 'log')];

% Optimize

minfn = @(T)kfoldLoss(XTrain', YTrain', cv, T.hiddenLayerSize, T.lr);

results = bayesopt(minfn, vars,'IsObjectiveDeterministic', false,...

'AcquisitionFunctionName', 'expected-improvement-plus');

T = bestPoint(results)

% Train final model on full training set using the best hyperparameters

net = feedforwardnet(T.hiddenLayerSize, 'traingd');

net.trainParam.lr = T.lr;

net = train(net, XTrain', YTrain');

% Evaluate on test set and compute final rmse

ypred = net(XTest');

finalrmse = sqrt(mean((ypred - YTest').^2))

function rmse = kfoldLoss(x, y, cv, numHid, lr)

% Train net.

net = feedforwardnet(numHid, 'traingd');

net.trainParam.lr = lr;

net = train(net, x(:,cv.training), y(:,cv.training));

% Evaluate on validation set and compute rmse

ypred = net(x(:, cv.test));

rmse = sqrt(mean((ypred - y(cv.test)).^2));

end

Don Mathis
on 26 Nov 2018

Sign in to comment.

Sean de Wolski
on 6 Nov 2018

Edited: Sean de Wolski
on 6 Nov 2018

This is nowhere near as easy as it should be. The shallow neural net infrastructure is old and uses row-major variables. This needs to be accounted for and you'll see it below with a ton of.' transposes. Second, you'll need to wrap around fitnet because it doesn't take in all of the options as name-value pairs like with the modern fit* functions in the statistics toolbox. Third, the training is non-deterministic unless you seed the rng yourself.

I don't understand the math behind using kfold cross validation with a neural net. Hence, I'll use holdout below which will reliably train and evaluate the network on an independent test sets.

Daten = rand(100, 3);

[m,n] = size(Daten) ;

P = 0.7 ;

Training = Daten(1:round(P*m),:) ;

Testing = Daten(round(P*m)+1:end,:);

XTrain=Training(:,1:n-1).'; % Note transposes

YTrain=Training(:,n).';

XTest=Testing(:,1:n-1).';

YTest=Testing(:,n).';

c = cvpartition(numel(YTrain),'Holdout', 0.25);

hiddenLayerSize=optimizableVariable('hiddenLayerSize',[1,20], 'Type', 'integer');

minfn = @(z)wrapFitNet(XTrain,YTrain, 'CVPartition', c, ...

'hiddenLayerSize',z.hiddenLayerSize);

results = bayesopt(minfn,hiddenLayerSize,'IsObjectiveDeterministic',false,...

'AcquisitionFunctionName','expected-improvement-plus');

Wrapper function

function cvrmse = wrapFitNet(x, y, varargin)

% Handle variable inputs

ip = inputParser;

ip.addParameter('hiddenLayerSize', 20);

ip.addParameter('CVPartition', cvpartition(numel(y),'Holdout', 0.10));

parse(ip, varargin{:});

cv = ip.Results.CVPartition;

hiddensz = ip.Results.hiddenLayerSize;

% Train net. You would adjust other hyper parameters here.

net = fitnet(hiddensz);

nets = train(net, x(:, cv.training.'), y(:, cv.training.'));

% Evaluate on test set and compute rmse

ypred = nets(x(:, cv.test.'));

cvrmse = sqrt(sum(ypred-y(cv.test.').^2)/numel(y(cv.test)));

end

Finally, if the only thing you want to optimize is hidden layer size, it may be easiest to just run a loop from 1:20 and try them all. Bayesian optimization really helps when you have many different parameters (trainfcn, etc.)

Dimitri
on 10 Nov 2018

Don Mathis
on 17 Nov 2018

There's a mistake in the rmse formula. Try this:

cvrmse = sqrt(mean((ypred-y(cv.test)).^2));

Sign in to comment.

Sign in to answer this question.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.