More Efficient ismember Calculation
Mostra commenti meno recenti
Hello, I'm working on an interpolation for a specific project that I'm working on and have managed to produce a very fast interpolation function for N-D array for042. There is one final bottleneck that is taking up 84% of the total run-time of the program. Inserted are the profiler and exemplar code, any and all optimizations are greatly appreciated, but the particular line is obvious from the profiler.

alpha = -5:5:30;
mach = 1:6:25;
beta = -5:2.5:5;
num = zeros(length(alpha),length(mach),length(beta));
for cnt = 1:numel(alpha)
for cnt1 = 1:numel(mach)
for cnt2 = 1:length(beta)
num(cnt,cnt1,cnt2) = 3*alpha(cnt)+mach(cnt1)/2+beta(cnt2);
end
end
end
for cnt2 = 1:length(beta)
data.for006{cnt2}.cd = num(:,:,cnt2);
end
Is an example of the data set that we're working with.
And the line in question from the profiler is part of this block:
a = zeros(1,2^size(CURRENT,1)-1);
curridx = zeros(1,size(CURRENT,1));
for cnt = 1:(2^size(CURRENT,1))
change = find((rem(cnt-1,2.^(0:size(CURRENT,1)-1))==0)==1);
curridx(change) = 1*curridx(change)==0;
idx = lidx.*(curridx==0) + uidx.*(curridx==1);
chkvals = zeros(1,length(idx));
for cnt1 = 1:length(idx)
B = RANGES{cnt1};
chkvals(cnt1) = B(idx(cnt1));
end
chkvals = [chkvals(2),chkvals(4:end)];
% INEFFICIENT LINE %
a(cnt) = DATA.for042{find(ismember(DATA.permutation,chkvals,'rows'))}.cn(sub2ind_a(siz,idx));
end
11 Commenti
Walter Roberson
il 16 Apr 2020
the find() can be skipped. ismember returns logical and that can be used as logical indexing.
I cannot read the rest of the line of code that is invoking ismember.
Walter Roberson
il 16 Apr 2020
is there a reason for for006 to be a cell array of struct instead of an array of struct?
Ayden Clay
il 16 Apr 2020
Ayden Clay
il 16 Apr 2020
Walter Roberson
il 16 Apr 2020
What is sub2ind_a() ?
Why are you not assigning size(CURRENT,1) to a variable? It does not change inside the loop.
2^size(CURRENT,1)-1 is used multiple times too, and should be assigned to a variable.
Ayden Clay
il 16 Apr 2020
Walter Roberson
il 16 Apr 2020
Guidelines for using find():
- If you have a relational test that is used exactly once, and you find(), and you use the result of the find() only to index an array: then skip the find() and use the result of the relational test as a logical index.
- if you are doing computation on the indices returned by find() then it might be worth retaining the find(). For example you might be wanting to compute the distance between events
- If you are updating corresponding locations, matrix(locations) = f(matrix(locations)) then one could hypothesize that taking find(locations) and using that might in some cases be faster than using logical indexing, because there would be less for the assignment to examine (just change a few locations directly, right?) . However in my tests with large arrays, using logical indexing is faster even for a small number of output locations, and is notably faster if there are a large number of output locations.
Note: this timing test takes a few minutes to execute due to the size of the arrays. For each test, a new output array the same size as the input has to be created.
data = rand(83,19,207,51,3);
data(12345) = -1;
data(876543) = -1;
N = 10;
t1 = zeros(N,1);
t2 = zeros(N,1);
t3 = zeros(N,1);
t4 = zeros(N,1);
f1 = @()fun1(data);
f2 = @()fun2(data);
f3 = @()fun3(data);
f4 = @()fun4(data);
for K = 1 : N; t1(K) = timeit(f1,0); end
for K = 1 : N; t2(K) = timeit(f2,0); end
for K = 1 : N; t3(K) = timeit(f3,0); end
for K = 1 : N; t4(K) = timeit(f4,0); end
plot([t1,t2,t3,t4]);
legend({'find0', 'mask0', 'find5', 'mask5'});
m1 = mean(t1); m2 = mean(t2); m3 = mean(t3); m4 = mean(t4);
ms = [m1;m2;m3;m4];
disp('timings')
disp(ms);
m = min(ms);
disp('ratios')
disp(ms ./ m);
function fun1(data)
%small number of changes, find
idx = find(data<0);
data(idx) = data(idx) * 2; %#ok<NASGU>
end
function fun2(data)
%small number of changes, logical indexing
idx = data<0;
data(idx) = data(idx) * 2; %#ok<NASGU>
end
function fun3(data)
%many places, find
idx = find(data<0.5);
data(idx) = data(idx) * 2; %#ok<NASGU>
end
function fun4(data)
%many places, logical indexing
idx = data<0.5;
data(idx) = data(idx) * 2; %#ok<NASGU>
end
Walter Roberson
il 16 Apr 2020
I cannot tell what the size() of lidx and uidx are ? And when could length(idx) be different than any other run? Anything that is calculated repeatedly in the loop should be moved out to as far out as it can go in the loops so it is calculated as few times as possible.
Ayden Clay
il 17 Apr 2020
Walter Roberson
il 17 Apr 2020
But the length of idx does not depend upon the contents of CURRENT, only on the size of CURRENT, right? So you can pre-compute it.
In the great majority of cases, if you can move a computation out of a loop, doing so will result in more efficient code. This is not always the case, but most of the time.
Ayden Clay
il 20 Apr 2020
Risposta accettata
Più risposte (0)
Categorie
Scopri di più su Shifting and Sorting Matrices in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

