train a deep learning model (resnet-50 network) on a remote HPC cluster

9 visualizzazioni (ultimi 30 giorni)
I am trying to run a code, which uses a pre-trained ResNet-50 network, on a remote HPC cluster by submitting batch GPU jobs. I get the following error at this line:
net = resnet50
Error using resnet50
resnet50 requires the Deep Learning Toolbox Model for ResNet-50 Network support
package for the pretrained weights. To install this support package, use the <a
href="matlab:
matlab.addons.supportpackage.internal.explorer.showSupportPackages('RESNET50',
'tripwire')">Add-On Explorer</a>. To obtain the untrained layers, use
resnet50('Weights','none'), which does not require the support package.
It seems the Deep Learning Toolbox Model for ResNet-50 Network add-on is not installed on the cluster. How can I install this add-on on it?
Thanks

Risposta accettata

David Willingham
David Willingham il 14 Ott 2022
Just to confirm, you're sending batch jobs to a HPC cluster that has MATLAB parallel server installed?
If so, one option to try would be:
  1. save resnet50 as as MAT file
  2. attach the MAT file when submitting the job
  3. have a load MAT file command in the function you're submitting.
  1 Commento
EK_47
EK_47 il 14 Ott 2022
Brilliant! Thank you for your answer. It solved my problem.
Yes, the HPC cluster has MATLAB paraller server installed.
In your point 1, you said "save resnet50 as a MAT file". I was not sure what you mean by "save resnet50". What I did was just I called it in MATLAB on my local machine
basenet = resnet50;
then saved it as
save('basenet.mat','basenet');
and then transferred this MAT file into the remote cluster and loaded it there.
Thanks

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Image Data Workflows in Help Center e File Exchange

Prodotti


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by