Matlab R2024b parallel pool not working above 32 cores
Mostra commenti meno recenti
Hi everyone,
I have a P8 ThinkStation with the AMD Ryzen ThreadRipper 7985WX working on W11 and Matlab R2024b installed. The processor has 64 physical cores and 128 logical ones. When I try to validate the local cluster profile for the parallel processing with a number of cores greater than 32, the validation fails at "SPMD job test" stage returning the following error:
Error Report: Job errored or did not reach the state 'finished'. MATLAB worker shut down unexpectedly with status -4 during task execution.
Indeed, the error status changes sometimes among -1, -2 and -4.
Any suggestion to fix this issues? I didn't have such a problem with R2023b...
Best regards,
Filippo
Risposte (2)
sidik
il 7 Nov 2024
0 voti
Hello @Filippo Ambrosino
try to follow this :
Step 1: Reduce the Number of Cores Used
- Open Matlab.
- Go to Home > Parallel > Manage Cluster Profiles.
- In the Cluster Profile Manager window, select local from the list of cluster profiles.
- Click on Edit at the bottom right.
- In the NumWorkers section, set the number to 32 (or a lower number if you want to test gradually).
- Click Done to save the changes.
- Close the Cluster Profile Manager window.
Step 2: Test the Cluster Profile
- Go back to Parallel and click on Validate.
- Let Matlab validate the cluster profile. If the test still fails, try decreasing NumWorkers (to 16 or 8) and validate again to see if a lower number of cores resolves the issue.
Step 3: Create a Custom Cluster Profile (if needed)
- If validation continues to fail, go back to Manage Cluster Profiles and click on New Profile.
- Name the new profile (e.g., CustomProfile).
- In NumWorkers, try a reasonable number (such as 16 or 24).
- Save by clicking Done.
- Set this new profile as the active profile by checking the box next to its name.
- Validate the profile by clicking on Validate.
if all the above steps fail, i suggest you to visit support and open a support ticket.
don't hesitate if you're still stuck
Filippo Ambrosino
il 7 Nov 2024
0 voti
8 Commenti
sidik
il 7 Nov 2024
Hello @Filippo Ambrosino
do you try to Use System Environment Variables like set OMP_NUM_THREADS=64?
try to open a command prompt (with administrator privileges) and set set OMP_NUM_THREADS=64.
Best regards,
Sidik
Alison Eele
il 7 Nov 2024
I would recommend opening a support ticket with a complete copy of your "Processes" profile cluster validation report failure for the 64 workers. They will be able to take a deeper look and help you use the number of workers you need.
Filippo Ambrosino
il 7 Nov 2024
Filippo Ambrosino
il 7 Nov 2024
sidik
il 7 Nov 2024
@Filippo Ambrosino yes better you contact the technical support
Filippo Ambrosino
il 7 Nov 2024
sidik
il 7 Nov 2024
@Filippo Ambrosino you're welcom
Same here:
running a simulation with more than 60 workers crashed with R2024b on several machines.
The same simulation runs fine with R2024a using 700 cores/Matlab workers.
No idea why R2024b crashed; also running SPMD validation test.
in the Job log there is only a "Matlab crashed on worker XXX" message - no other useful information.
Raffael-
Categorie
Scopri di più su Parallel Computing Fundamentals in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!