How can I efficiently save and access large arrays generated in nested loops?

3 visualizzazioni (ultimi 30 giorni)
I need to run nested for-loops over the variables J1 and J2. The range for J1 is 1 to 41, and the range for J2 is 1 to 9. Inside these loops, I evaluate 16 functions, each of which returns an array of complex numbers with a size of 500 by 502.
I used the following given method to save the data, and it produced an 11 GB file, which seems very large. Is this normal? What is an efficient way to save this data at the end of the calculation?
What I want to do with this data afterward:
I will need to access the 16 arrays, A1 to A16, within the same J1 and J2 loop to perform other operations. Therefore, I want to store the data in a way that allows easy access to these 16 arrays within the loops.
My method to store data:
all_data = cell(41,9);
for J1 = 1:41
for J2 = 1:9
%evaluate 16 function to get 16 arrays (A1 to A16) of size 500 x 502:
all_data{J1,J2} = struct("A1", A1,...
"A2", A2,...
"A3", A3,...
"A4", A4,...
"A5", A5,...
"A6", A6,...
"A7", A7,...
"A8", A8,...
"A9", A9,...
"A10", A10,...
"A11", A11,...
"A12", A12,...
"A13", A13,...
"A14", A14,...
"A15", A15,...
"A16", A16);
end
end
save('Saved_Data.mat','-v7.3');

Risposta accettata

Matt J
Matt J il 22 Ago 2024
Modificato: Matt J il 22 Ago 2024
I used the following given method to save the data, and it produced an 11 GB file, which seems very large.
The memory consumption is about right if you are using double floats,
numGB=prod([500,502,16, 41,9])*8/2^30
numGB = 11.0410
In terms of RAM access, it would probably be faster to organize it is a multidimensional array, as below, and as single floats if you don't need double precision.
all_data=rand(500,502,16, 41,9,"single");
for J2 = 1:9
for J1 = 1:41
for J3=1:16 %evaluate 16 functions func{J3}
all_data(:,:,J3,J1,J2)=func{J3}(___) ;
end
end
end
  3 Commenti
Luqman Saleem
Luqman Saleem il 22 Ago 2024
Modificato: Luqman Saleem il 22 Ago 2024
Alright, I tried saving the data in 41*9=369 folders with 16 csv files each. The total size of all the files combined is again 12 GB.
Matt J
Matt J il 22 Ago 2024
Modificato: Matt J il 22 Ago 2024
Yes, I don't think you hope for much compression on disk. Unless perhaps the data is sparse, or consists of integers?

Accedi per commentare.

Più risposte (1)

Walter Roberson
Walter Roberson il 22 Ago 2024
all_data{J1,J2} = struct("A1", A1,...
You are creating a separate struct for each {J1,J2}, complete with all of the struct overhead. It would be more efficient if you use
all_data(J1,J2) = struct("A1", A1,...
so as to create a struct array. struct arrays have lower overhead compared to creating a seperate struct for each case.
You will need to initialize all_data differently. I suggest,
clear all_data
for J1 = 41:-1:1
for J2 = 9:-1:1
%evaluate 16 function to get 16 arrays (A1 to A16) of size 500 x 502:
all_data(J1,J2) = struct("A1", A1,...
Counting backwards like this will have the side effect of initializing the struct array to its largest size, and then to fill in the pieces. This approach avoids growing the struct array dynamically.

Prodotti


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by