Why do I receive an error when attempting to start a worker for the MATLAB Parallel Server 1.0.1 (R14SP2)?

10 visualizzazioni (ultimi 30 giorni)
When starting a worker with the MATLAB Parallel Server 1.0.1 (R14SP2), I receive the following error:
ERROR: worker exited unexpectedly while starting.
The cause of this problem is:
======================================================
This could be due to a licensing problem or due to a MATLAB crash during startup. Please check the worker log files for more detailed information.
======================================================
or
ERROR: timeout creating child process
In the worker log files, the licensing error looks like the following:
ERROR: License Manager Error -5.
Cannot find a license for MATLAB.
Make sure your license file is correct.
No such feature exists
Feature: MATLAB_DMLWorker

Risposta accettata

MathWorks Support Team
MathWorks Support Team il 17 Feb 2021
Modificato: MathWorks Support Team il 17 Feb 2021
This enhancement has been incorporated in Release 2007a (R2007a). For previous product releases, read below for any possible workarounds:
Note: There errors have changed as of MATLAB Distributed Computing Engine 2.0 (R14SP3+). See the Related Solutions listed below for more information.
These errors usually occur when one of the following is true:
1. The worker failed to checkout the license from the license server
2. Using ssh to connect to a worker node
3. The MATLAB Parallel Server was never installed on worker node
4. MDCE service was never started
Read below for possible causes and the solutions to above problems:
When you start a worker on any worker node, it sends a request to the network license manager to checkout the license. If a worker is unable to find the license manager or the license manager does not have the correct license for the worker, the Engine will fail to start a worker and will generate errors.
1. Check the worker log file located in /var/log/mdce or C:\TEMP\MDCE\log
- The log file will usually display any license manager errors or MATLAB startup error messages. These will need to be resolved before the worker will start. Look at the solutions related to the license manager error given in the worker log.
2. Check the mdce-service.log file located in /var/log/mdce or C:\TEMP\MDCE\log.
- In the event the worker log was not created, the service log should display some information as to the unsuccessful start of the worker.
3. Check the lmlog (license manager log) on the network license manager. This log will have information about the license usage. look for the error message related to the Distributed Computing engine.
4. Check the connectivity between the worker node and the license manager. Make sure that the worker can ping the license manager.
5. Start a worker on a headnode to verify that the license server is up and running and your license file is all correct.
6. After resolving issues with the license file, follow the steps below:
- Stop MDCE Service
- Delete all the log files from /var/log/mdce or C:\TEMP\MDCE\log
- Start MDCE Service
- Start the Job Manager
- Start the worker
If you are using ssh session to start a worker on Unix/Linux machine, read below:
To resolve this issue, you should instead log into the machine and perform an "su" command to login to the user account to start the mdce service.
This will also prevent the service from shutting down when you exit the ssh session.
1. Login to the machine:
ssh hostname
or
ssh user@hostname
2. "su" into the account to start mdce (usually root):
su root
3. Start the mdce service and the worker or job manager from this login session.
You should now be able to exit this session if needed, and the service will continue to run successfully.
If you are using ssh session to start a worker on MAC, Please read below:
There is a bug in the MATLAB Parallel Server 1.0.1 (R14SP2) when trying to start workers on a Macintosh via a remote ssh session. To work around this issue, try using an rsh session to run your remote startup script.
If you have installed only the Distributed Computing Toolbox and are attempting to start a worker, you will receive this error because the MATLAB Parallel Server is not installed. This must be installed first before calling these commands.
For more information about the Distributed Computing Toolbox and the MATLAB Parallel Server, please visit the Documentation page at the following link:
The worker will not start if the MDCE service has not been started before starting a worker. If this is the case, then follow the instruction below to start MDCE and jobmanager on the headnode and MDCE service on the worker node.
-To Check the status of MDCE:
Windows - Task Manager (mdced.exe) or Administrative Tools > Services
UNIX - $MATLAB/toolbox/distcomp/bin/mdce status
-To start the mdce service if it is not running:
Windows - From the MATLABROOT/toolbox/distcomp/bin/ directory
Type: mdce start
UNIX - $MATLAB/toolbox/distcomp/bin/mdce start
-To start a JobManager:
WINDOWS - $MATLAB\toolbox\distcomp\bin\win32\startjobmanager -name MyJobManager
UNIX - $MATLAB/toolbox/distcomp/bin/startjobmanager.sh -name MyJobManager
Note that if you have tried all of the above and are still not able to start the worker, then try to start MATLAB on the worker node to see if there is any problem with MATLAB startup (this will help us identify the cause of the problem). Remember, you need to have a MATLAB license to start MATLAB on a worker node.
NOTE: Starting in R2019a the following name changes occurred:
  •     MATLAB Distributed Computing Server was renamed to MATLAB Parallel Server
  •     mdce_def was renamed to mjs_def
  •     mdce binary was renamed to mjs

Più risposte (0)

Categorie

Scopri di più su Manage Products in Help Center e File Exchange

Tag

Non è stata ancora inserito alcun tag.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by