Matlab and Hadoop integration

Asked by U S N Raju on 11 Jan 2019
Latest activity Commented on by U S N Raju on 17 Jan 2019
we made a cluster of 3 systems, having 4 workers each, altogether 12 workers.
After scheduling a job in MATLAB on that cluster, it got stuck in 'Starting Parallel Pool' Phase. We are attaching some screenshots for your reference.


1 Answer

Answer by Kojiro Saito on 14 Jan 2019
 Accepted Answer

Without your whole code (apart1.m), it would be difficult to investigate why the error occurs, but as far as I guess, it might be missing to set mapreducer to use parallel.cluster.Hadoop.
As this document explains, at least the three lines are needed to run MapReduce on Hadoop.
setenv('HADOOP_HOME', '/path/to/hadoop/install')
% This will run mapreduce on Hadoop
cluster = parallel.cluster.Hadoop;
% If you want to change properties of parallel.cluster.Hadoop,
% please see
% for example, if the installation path of MATLAB Distributed Computing Server on Hadoop clusters
% is different from that of MATLAB Desktop on Hadoop node, you need to change ClusterMatlabRoot property.
% cluster.ClusterMatlabRoot = '/path/to/MDCS/install';
mr = mapreducer(cluster);
After this mapreducer setting, mapreduce will be run on Hadoop.

  1 Comment

We want to read the data from HDFS and need to process with MDCS.
It is detecting the workers(3 workers) but giving IRI scheme for path: 'hdfs://master:9000/job0_2' is unsupported.
Here is our code
ds = imageDatastore({'hdfs://master:9000/Corel_1000'});
system('hdfs dfs -rm -r hdfs://master:9000/job0_2'); %<------------- deleting previous output directory
output_folder = 'hdfs://master:9000/job0_2';
ds = mapreduce(ds,@identityMapper1,
This is the output
Starting parallel pool (parpool) using the 'new_job' profile ...
connected to 3 workers.
19/01/17 17:53:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
rm: `hdfs://master:9000/job0_2': No such file or directory
Error using mapreduce (line 124)
IRI scheme for path: 'hdfs://master:9000/job0_2' is unsupported.
Error in apart1_2 (line 9)
ds = mapreduce(ds,@identityMapper1, @identityReducer1,mr,'OutputFolder',output_folder);
Cannot write to preference file "matlab.prf" in "/home/cse/.matlab/R2018a".
Check file permissions.

