Azzera filtri
Azzera filtri

Distributed computing - job still 'running' when actually finished. Why?

2 visualizzazioni (ultimi 30 giorni)
I have been using MATLAB distributed computing for about two years now (R2014b). I have noticed multiple issues and am wondering what the cause could be. I am not sure if they are related or not.
  1. The MATLAB GUI that lists jobs and their status hangs while trying to update status, and makes MATLAB freeze. I stopped using it because of that, but it seems like there should be a fix.
  2. [Behavior until this last month] When I used the MATLAB terminal to check on the status of jobs, they will still be listed as 'running' despite the job being finished on the server side - I could ssh into our server and see that my job was no longer in the queue, but MATLAB would still think the job was running. When I ssh into the job folder on the server, the output file Task1.out.mat in the appropriate job folder would be full of my outputs, but the Job#.out.mat file outside the folder is empty.
  3. [Behavior this last month] Now when I use the MATLAB terminal to check on the status of jobs, sometimes MATLAB just hangs forever or tells me that the job is still running. If I ssh into my server, I can see that the tasks are still listed as running in the queue, but in some cases I can see that the Task1.out.mat file has been written and has my outputs, despite the process still running. I now need to check the size of that output file to determine if my job has finished and then manually cancel the jobs to get them to stop - they won't do it on their own.
  4. [Behavior this last month] If I use the diary() command on a job that I can tell has finished because all the outputs are there in Task1.out.mat, the diary is incomplete. For example one job I submitted had a for i = 1:10000 loop that I had print every 100 iterations. When I check the diary it stops at 6500 but when I pull the output from the Task1.out.mat file, the array it was generating is full.

Risposte (1)

Sean de Wolski
Sean de Wolski il 19 Gen 2018
For 1) It's gotten a lot better and stable in more recent releases.
  2 Commenti
EQ
EQ il 19 Gen 2018
Good to know; maybe I will try persuading the powers that be that it's time to upgrade. Any thoughts on 2-4?

Accedi per commentare.

Categorie

Scopri di più su MATLAB Parallel Server in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by