Azzera filtri
Azzera filtri

load specified column in matfile too slow

2 visualizzazioni (ultimi 30 giorni)
Yu Li
Yu Li il 3 Ott 2018
I have a matfile with size of: 5e6*50.
I want to write a code to load the specific column into my memory, but I found that the time reading a specified column is nearly the same with that reading the whole matfile.
below is the test code:
A=rand(5e6,50);
save A A
f=matfile('A');
tic
tmp=f.A;
toc
tic
tmp=f.A(:,1);
toc
is there anyway to improve the performance?
Thanks!
Yu
  2 Commenti
Walter Roberson
Walter Roberson il 3 Ott 2018
I notice you did not specifically save with -v7.3, so you might be getting a -v7 file.
When I test on my system with -v7.3, selecting one column comes out roughly 10% faster. Not as good as one might hope, though.
Yu Li
Yu Li il 3 Ott 2018
Yes, the expected result should be 2% of reading the whole file, since I have totally 50 columns.

Accedi per commentare.

Risposte (1)

Walter Roberson
Walter Roberson il 3 Ott 2018
In this case, you can do much better by using -nocompression
Save -v7.3 -nocompression
Elapsed time is 17.814604 seconds.
Done save -v7.3 -nocompression
Start matfile -v7.3 -nocompression
Elapsed time is 0.016278 seconds.
Done matfile -v7.3 -nocompression
Start recall entire variable -v7.3 -nocompression
Elapsed time is 2.195975 seconds.
Done recall entire variable -v7.3 -nocompression
Start recall one column -v7.3 -nocompression
Elapsed time is 1.089280 seconds.
Done recall one column -v7.3 -nocompression
Save -v7.3
Elapsed time is 58.543461 seconds.
Done save -v7.3
Start matfile -v7.3
Elapsed time is 0.077814 seconds.
Done matfile -v7.3
Start recall entire variable -v7.3
Elapsed time is 10.139135 seconds.
Done recall entire variable -v7.3
Start recall one column -v7.3
Elapsed time is 9.118167 seconds.
Done recall one column -v7.3
Source code:
A=rand(5e6,50);
time_it(A, {'-v7.3' '-nocompression'})
time_it(A, {'-v7.3'})
function time_it(A, saveoptions)
savedesc = strjoin(saveoptions, ' ');
fprintf('Save %s\n', savedesc);
tic
save('A', 'A', saveoptions{:});
toc
fprintf('Done save %s\n', savedesc);
fprintf('Start matfile %s\n', savedesc);
tic
f = matfile('A');
toc
fprintf('Done matfile %s\n', savedesc);
fprintf('Start recall entire variable %s\n', savedesc);
tic
tmp=f.A;
toc
fprintf('Done recall entire variable %s\n', savedesc);
fprintf('Start recall one column %s\n', savedesc);
tic
tmp=f.A(:,1);
toc
fprintf('Done recall one column %s\n', savedesc);
end
  7 Commenti
Yu Li
Yu Li il 3 Ott 2018
Hi:
Thanks for your test, I got much deeper understanding about this.
I think there should have some bottom neck here, I will contact Mathworks for further investigation.
Thanks!
Yu
Walter Roberson
Walter Roberson il 3 Ott 2018
You can refer them to this post and my test code.
One thing they are likely to point out is that the default of compression is intended for "real" data, not for rand(), and that when you read/write with compression, the performance would be expected to vary with how compressible the data is. Thus you should probably run this code with the rand() replaced by load of one of your actual matrices.

Accedi per commentare.

Categorie

Scopri di più su Data Type Conversion in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by