Speed up loading struct from file.

22 visualizzazioni (ultimi 30 giorni)
Mitchell Tillman
Mitchell Tillman il 27 Ago 2021
Commentato: Walter Roberson il 28 Ago 2021
Hi,
I am looking for a way to speed up saving & loading ~8GB of data. Currently, it is all contained within one structure. The structure has a format similar to the code below - there is also some metadata at each level of the struct not shown here.
for subNum=1:10; % 10 subjects
for trialNum=1:50; % 50 trials per subject
for dataStreamNum=1:50; % 50 data streams per subject
dataMatrix=rand(3,3000); % Each data stream is 3x3000
structName.Subject(subNum).Trial(trialNum).Data(dataStreamNum).Matrix=dataMatrix; % Data in matrix form
end
end
end
I looked into matfile to be able to load just part of the structure, but found that matfile doesn't allow for accessing specific fields. This post made me start thinking about splitting up each trial into its own separate .mat file (in this example there would be 500 .mat files, each of which is a smaller struct). So, I have two questions in total:
  1. Is there an alternative to matfile that would allow me to load just one trial at a time, from an 8GB struct, such as:
structName.Subject(4).Trial(15);
2. If there is no such alternative, if I use the load() command on 500 .mat files one at a time (for a total of 8GB of data), would that be noticeably slower or faster than using load() on 1 8GB .mat file?
Thank you!
  1 Commento
Walter Roberson
Walter Roberson il 28 Ago 2021
With files over 2 GB, to save as a .mat file, you would have to be using -v7.3 flag, which causes the writing to be done in HDF5 format. HDF5 format is not all that efficient for arrays of struct; it more or less requires that each array member be stored as a sub-dataset and then have the struct array internally be an array of references to sub-datasets.
Because of this, you might want to experiment to see what you can do with NetCDF 3 -- 3.6 and later has large file support. But beware that NetCDF 4 is HDF5 underneath...

Accedi per commentare.

Risposte (1)

Chunru
Chunru il 28 Ago 2021
It seems that you have very regular data. Instead of using struct, you can simply use N-D numerical array which is faster and more efficient. Using matfile, you can easily get a small portion of data.
% for subNum=1:10; % 10 subjects
% for trialNum=1:50; % 50 trials per subject
% for dataStreamNum=1:50; % 50 data streams per subject
% dataMatrix=rand(3,3000); % Each data stream is 3x3000
% structName.Subject(subNum).Trial(trialNum).Data(dataStreamNum).Matrix=dataMatrix; % Data in matrix form
% end
% end
% end
Data(3, 3000, 50, 50, 10);
  1 Commento
Mitchell Tillman
Mitchell Tillman il 28 Ago 2021
Thanks for the suggestion, but I actually do need the structure format. This was just a rough outline of my data format. The length of the data streams actually vary significantly, and the number of data streams per trial and trials per subject too. Because of that, and because I have lots of associated metadata too, I need the structure format.

Accedi per commentare.

Categorie

Scopri di più su Structures in Help Center e File Exchange

Prodotti


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by