Accumarray for two functions?

I'm trying to compute both the mean and SD of a set of values by group, and I can either do it with for-loop:
M = zeros(1,ngroup);
SD = zeros(1,ngroup);
for i = 1:ngroup
M(i) = mean(data(ind==i));
SD(i) = std(data(ind==i));
end
Or, alternatively use `accumarray` twice.
M = accumarray(ind,data,[],@mean);
SD = accumarray(ind,data,[],@std);
But is there a way to just use accumarray once and compute both quantities? Since accumarray is faster than for-loop, but calling it twice will be slow. Is it possible to do something like:
[M, SD] = accumarray(ind,data,[],{@mean,@std})

 Risposta accettata

Matt J
Matt J il 19 Ott 2022
Modificato: Matt J il 19 Ott 2022
Since accumarray is faster than for-loop, but calling it twice will be slow.
I would argue that 3 calls to accumarray would be the most optimal.
data = rand(25e5,1);
ind=randi(100,size(data));
tic;
M0 = accumarray(ind,data,[],@mean);
SD0 = accumarray(ind,data,[],@std);
toc
Elapsed time is 0.499977 seconds.
tic;
Out = accumarray(ind, data, [], @(x){{mean(x) std(x)}});
toc;
Elapsed time is 0.235066 seconds.
tic;
N=accumarray(ind,1);
S = accumarray(ind,data);
S2=accumarray(ind,data.^2);
M=S./N;
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
toc
Elapsed time is 0.047581 seconds.

4 Commenti

BRIAN XU
BRIAN XU il 19 Ott 2022
Wow that's interesting, many thanks!
Accumarray is optimized when doing sum.
Using function handle has great penalty.
Matt J
Matt J il 19 Ott 2022
Modificato: Matt J il 19 Ott 2022
I should mention though that there might be some sacrifice in numerical accuracy using the expansion,
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
The subtraction of S2 and 2*S.*M can give large floating point residuals.
data = 10000+rand(25e5,1);
ind=randi(100,size(data));
tic;
M0 = accumarray(ind,data,[],@mean);
SD0 = accumarray(ind,data,[],@std);
toc
Elapsed time is 0.574901 seconds.
tic;
N=accumarray(ind,1);
S = accumarray(ind,data);
S2=accumarray(ind,data.^2);
M=S./N;
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
toc
Elapsed time is 0.048677 seconds.
errorMean=norm(M-M0)/norm(M0)
errorMean = 4.4567e-15
errorSTD=norm(SD-SD0)/norm(SD0)
errorSTD = 5.3453e-06
Small simplification
N=accumarray(ind,1);
S = accumarray(ind,data);
M = S./N;
S2=accumarray(ind,data.^2);
SD = sqrt((S2 - S.*M)./(N-1));

Accedi per commentare.

Più risposte (1)

The accumarray approach is certainly possible —
v = randn(250,1)
v = 250×1
0.0501 -0.2469 1.0253 1.1484 -0.9196 0.4947 -0.7221 1.2164 -0.3162 0.0461
Out = accumarray(ones(size(v)), v, [], @(x){{mean(x) std(x)}})
Out = 1×1 cell array
{1×2 cell}
meanv = Out{:}{1}
meanv = -0.0333
stdv = Out{:}{2}
stdv = 0.9419
Make appropriate changes to work with your data.
.

2 Commenti

BRIAN XU
BRIAN XU il 19 Ott 2022
that's a nice idea! thank you!
My pleasure!

Accedi per commentare.

Categorie

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by