How to diagnose "Out of Memory during deserialization" when running on high core count

21 visualizzazioni (ultimi 30 giorni)
Hello MATLAB Community! I'm trying to figure out a problem with parallel computing at the moment. I'm running the built-in genetic algorithm on a high performance cluster. Now I'm running into the issue that if I assign a certain number of cores the code does not work anymore.
I get the proper reply when starting the parallel pool (Connected to 48 or 280 workers) but when using 280 workers I first get a couple of warnings like this:
[^HWarning: A worker aborted during execution of the parfor loop. The parfor loop
will now run again on the remaining workers.]^H
[^H> In parallel_function (line 599)
In fcnvectorizer (line 16)
In gaminlppenaltyfcn
In gapenalty
In makeState (line 64)
In galincon (line 17)
In gapenalty
In gaminlp
In ga (line 366)
and then it finally crashes with this:
{^HError using fcnvectorizer (line 16)
Out of Memory during deserialization
On my local machine, the same code runs fine and needs about 2 gigabytes of ram per worker. I have assigned 4.5 gigabytes per core on the cluster, so I don't think it's an actual memory issue. However, all solutions I found online regarding this error point to memory issues.
Any input is greatly appreciated.
Cheers.
  2 Commenti
Zhenhao Gong
Zhenhao Gong il 19 Nov 2023
Hello, I met the same problem as you, and MATLAB also gave me feedback of "Out of Memory during deser....". But by my calculations, my code uses up to 20GB of memory, and the maximum RAM allowed by the cluster is 168GB. In fact, code does use 160GB.
I think some matrix operations under parfor are wrong.

Accedi per commentare.

Risposte (0)

Prodotti


Release

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by