Can the efficienty of this code be improved, either computationally or just in terms of lines of code?
4 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Dumb question for a smart person who has a moment to kill.
Let's say I have data that will come in from n groups, and I know a priori those groups will be numbered 1 through n in some variable, A. I will have a second variable, B, that contains the data. Then, I want to get (for example) the mean of the data in each group. It is easy to pull off with a loop, but is there better code I could be using for this procedure? For a small example dataset, I might have
A = [2; 3; 1; 2; 2; 3; 1; 2; 2; 3];
B = [4.10047; 7.44549; 3.62159; 6.56964; 2.87221; 4.51231; 4.01697; 5.60534; 5.5440; 7.07802];
tic
%%% Can this be done better or in one line of code? %%%
C = NaN(max(A), 1);
for ii = 1:numel(C)
C(ii) = mean(B(A == ii));
end
%%% Can this be done better or in one line of code? %%%
toc
disp(C)
bar(C)
Is there a better way to do this?
0 Commenti
Risposta accettata
Jan
il 5 Dic 2022
Modificato: Jan
il 5 Dic 2022
A0 = [2; 3; 1; 2; 2; 3; 1; 2; 2; 3];
B0 = [4.10047; 7.44549; 3.62159; 6.56964; 2.87221; 4.51231; 4.01697; 5.60534; 5.5440; 7.07802];
A = repmat(A0, 1e6, 1); % Let Matlab work with more than tiny data
B = repmat(B0, 1e6, 1);
tic
C = NaN(max(A), 1);
for ii = 1:numel(C)
m = A == ii;
C(ii) = sum(B(A == ii));
end
toc
Shorter but slower:
tic
D = accumarray(A, B, [], @mean);
toc
isequal(C, D)
Another apporach:
tic
S = zeros(max(A), 1);
N = zeros(size(S));
for k = 1:numel(A)
m = A(k);
S(m) = S(m) + B(k);
N(m) = N(m) + 1;
end
E = S ./ N;
toc
isequal(C, E) % Not equal!!!
% But the differences are caused by rounding only:
(C - E) ./ C
The difference is caused by the numerical instability of sums. Comparing the results with the mean of A0 and B0 shows, that all methods have comparable accuracy.
Locally under R2018b I get these timings:
Elapsed time is 0.205890 seconds. % Original
Elapsed time is 0.512173 seconds. % ACCUMARRAY
Elapsed time is 0.061097 seconds. % Loop over inputs
2 Commenti
Torsten
il 5 Dic 2022
I took your repmat modification and added Steven Lord's answer, below, and the original loop looks like the clear winner.
Or "arrayfun" (see above).
Più risposte (1)
Steven Lord
il 5 Dic 2022
A = [2; 4; 1; 2; 2; 4; 1; 2; 2; 4];
B = [4.10047; 7.44549; 3.62159; 6.56964; 2.87221; 4.51231; 4.01697; 5.60534; 5.5440; 7.07802];
[C, groupnumbers] = groupsummary(B, A, @mean)
The groupnumbers output can help if some elements in 1:n don't appear in A (as is the case using the modified A I used in this example where all the 3's are replaced by 4's.)
Vedere anche
Categorie
Scopri di più su Matrix Indexing in Help Center e File Exchange
Prodotti
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!