Parfor nested loop Table definition

2 visualizzazioni (ultimi 30 giorni)
Hi all,
I am fairly new to matlab. I am trying to parallelize a very heeavy nested for loop. I cannot reproduce it all here since it is too ong, but maybe sharing the critical parts could be usefull. In particular I am stacked at the "Valid indices for table..." error when implementing Tables within parfor loop. Ass far as I have understood I need to define the tables within the parfor loop but I don't know if simply definining an empty table would solvee the issue. The loop (critical parts) look as follows. Pleasse if you need the entire loop do nott exitate:
Can you pleasse help me solving this?
Thank you,
Federico
A = A_tab.Variables;
portion_missing=0.3;
SIMULAZIONE_INIZIALE = 1;
N_SIMULAZIONI = 5;
count=1;
count3=1;
sheet2=1;
sheet1=1;
string='Indici';
string1='RMSE_initial_known';
string2='RMSE_final_known';
string3='RMSE_initial_validation';
string4='RMSE_final_validation';
string5='RMSE_final_corrected_validation';
string6='RMSE_initial_test';
string7='RMSE_final_test';
string8='RMSE_final_corrected_test';
true='true_values_test';
pred='predictions_test';
parfor SIMULAZIONE=1:N_SIMULAZIONI
[...]
CompletedMatrix{k}=CompletedMat;
CompletedMat_corrected=CompletedMat;
CompletedMatrix_corrected{k}=CompletedMat_corrected; %first error
[...]
str1 =sprintf('%s_%d',string,SIMULAZIONE);
str2 =sprintf('%s_%d',str1,RIGA_SELEZIONATA);
Table{count}=array2table(INDICI_RIGHE_MISSING(:), 'VariableNames', {str2}); %second error (def of Table)
[...]
count2=1; %redefined inside the parfor loop
%other equal errors appear here when defining Table_prova1-Table_prova8
Table_prova1{count2}=array2table(RMSE_initial_known(:), 'VariableNames', {str12});
Table_prova2{count2}=array2table(RMSE_final_known(:), 'VariableNames', {str22});
Table_prova3{count2}=array2table(RMSE_initial_validation(:), 'VariableNames', {str23});
Table_prova4{count2}=array2table(RMSE_final_validation(:), 'VariableNames', {str24});
Table_prova5{count2}=array2table(RMSE_final_corrected_validation(:), 'VariableNames', {str25});
Table_prova6{count2}=array2table(RMSE_initial_test(:), 'VariableNames', {str26});
Table_prova7{count2}=array2table(RMSE_final_test(:), 'VariableNames', {str27});
Table_prova8{count2}=array2table(RMSE_final_corrected_test(:), 'VariableNames', {str28});
end
  6 Commenti
Jeff Miller
Jeff Miller il 7 Feb 2021
Modificato: Jeff Miller il 7 Feb 2021
To see how to write 8 columns at once, look at this example in the matlab 'writetable' documentation. First, use 'array2table' to get the 8 columns of data into a table. Then you can write that data to an xlsx file with one writetable command. Specify a square block of 8 adjacent Excel columns (say, A-H) and the appropriate rows (say, 1-50) with this handy notation: 'Range','A1:H50'.
Sorry, I have never tried write xlsx files with parfor and I have no idea what the problem is with that.
federico nutarelli
federico nutarelli il 7 Feb 2021
Perfect. Thank you very much for the help!

Accedi per commentare.

Risposta accettata

Paul Hoffrichter
Paul Hoffrichter il 7 Feb 2021
>> can I at least parallelise the writing of the 4 xlsx tables to speed up the code
I do not see how to do that easily without synchronization. Don't you care about the order in which the tables are written? Take a look at this question and answer. Each worker writes to its own file, or do as Jeff Miller says and just update memory and write out after the loop.
It would be helpful if you show (1) the exact error message, (2) post the smallest script that illustrates your problem (along with brief files, if you think that is absolutely necessary) and which we can actually run without error in a for loop but not in a parfor loop.
Parfor errors are difficult to debug without having access to some for-loop runnable code (and indicate the line where you change to parfor). Please refer to this helpful information to see whether you can spot your problem:
In the first link is the "Solve Variable Classification Issues in parfor-Loops" section.
For example, Broadcast VariablesVariables are defined before the loop whose value is required inside the loop, but never assigned inside the loop. This means that the parfor distributor has to broadcast these variables to evey helper process.
Without the code that illustrates the problem I can only take a guess as to what may be causing that error. (And there may be more than one error.)
I found two variables that I was not able to classify per the table in the link.
count is defined outside the parfor loop but is not classified as broadcast since its count is assigned inside the loop.
Table{count}: Each worker can access the Table{count} element since the count ranges are not mutually exclusive across the workers.
count=1;
.
.
parfor
.
.
for counter=1:L
.
Table{count}=array2table(INDICI_RIGHE_MISSING(:), 'VariableNames', {str2});
.
count=count+1;
end % END for counter=1:L
end
end % parfor
  2 Commenti
federico nutarelli
federico nutarelli il 7 Feb 2021
Modificato: federico nutarelli il 7 Feb 2021
@Paul Hoffrichter thank you very much for the answer. So I think I will change strategy in explaining the probem with a runnablle code. You are right in the sync issue, which I am running when running the toy code proposed afterward.
The idea now is to apply the paralleisation just to the table-writing part. Since I am not aware in general on how to do it with xlsx files, I am trying with a simple matrix to be put in the txt file. The idea of the following code is opening 6 files numbered 1 to 6 in write-read mode and writing back matrix P colwise or rowwise. I managed to do that by converting each row(col) of the matrix in an array string and formatting like that. A for loop basically appends the formed files 1 to 6 into a single fat one txt fille. However as you suggested, running each core in parallel I was unable to sync them which results in the matrix's rows (cols) being written apparently random (i.e. I guess baed on the order in which each core ends).
cd '/Users/federiconutarelli/Desktop/MatrixCompletion/BACI/simulazioni_matlab'; %please change this to your own path
%% opening 6 txt files in parallel
c = parallel.pool.Constant(@() fopen(tempname(pwd),'wt'),@fclose);
spmd
F=(fopen(c.Value));
end
P=magic(19);
S=size(P,1);
s = repmat('%5d ', [1, S]);
%qua importing row by row P
parfor idx = 1:S
fprintf(c.Value,'%s,','%s,', idx, mat2str(P(idx,:))); %fprintf scrive sui txt files c.Value 'Iteration: %d\n',idx
end
clear c; % Closes the temporary files.
%%% here are created the 6 txt files called 1,2,3,4,5,6
for i = 1: length(F)
a= char(F(1,i));
NAMES(i,:) = a(1, length(pwd)+2:length(a));
movefile(NAMES(i,:),sprintf('%d.txt',i));
end
%%%%%%%%%%%%%%%%% this code appends all in a unique txt file
fileout='OneFatFile.txt';
fout=fopen(fileout,'w');
for cntfiles=1:length(A)
fin=fopen(sprintf('%d.txt', cntfiles));
while ~feof(fin)
fprintf(fout,'%s \n',fgetl(fin));
end
end
fclose(fin);
fclose(fout);
%%%%%%%%%%%%%%%%%you can then delete the unnecesary files by the following loop
fclose('all');
for i = 1:length(A)
delete(sprintf('%d.txt',i))
end
Now, what I would like to achieve is a similar code to apply to generate each of the xlsx tables displayed at the end of the detailed code above. Hence the idea with respect to the first comment changes in that I do not want to apply to apply the parfor to the entire for loop (i.e. parfor SIMULAZIONE=1:5 rather than for SIMULAZIONE=1:5) but just to the final part (i.e. forming RMSE.xlsx table and thee other ones). Hope this is clear. If not please do not exitate.
Paul Hoffrichter
Paul Hoffrichter il 8 Feb 2021
Hope this is close to what you need.
clearvars; clc
%% setup dummy data
numCPUcores = 88;
numRows = 3;
P=[ magic(numCPUcores) magic(numCPUcores)];
sfmt = repmat(' %5d', [1, 2*numCPUcores]);
sfmt = ['%04d-%d test: ' sfmt];
%% opening txt files in parallel for writing
% here are created the 6 txt files called tp1,tp2,tp3,tp4,tp5,tp6
parfor fn = 1:numCPUcores
fd_out = fopen( ['tp' num2str(fn) '.txt'], 'wt+' );
for row = 1:numRows
fprintf(fd_out, '%s', sprintf( [sfmt '\n'], fn, row, P(fn,:) )); % just duplicating P 3x
end
cleanup = onCleanup(@() fclose(fd_out));
% % % % fclose(fd_out);
end
%% %%%%%%%%%%%%%%% this code appends all into a sectionalized memory cell
% using cell because the application might have varying length items.
TableContainer = cell(numCPUcores, 1);
parfor fn=1:numCPUcores
fd_in=fopen(sprintf('tp%d.txt', fn));
rownum = 1;
while ~feof(fd_in)
strLine = fgetl(fd_in);
TableContainer{fn} = [TableContainer{fn} '\n' char(strLine)];
rownum = rownum + 1;
end
fclose(fd_in);
end
%% write the table to a file
tabArray = cell2mat(TableContainer);
fd_out = fopen('OneFatFile.txt', 'wt');
cleanup = onCleanup(@() fclose(fd_out));
fprintf(fd_out, '%s', tabArray');
fclose(fd_out);

Accedi per commentare.

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by