Implicit expansion with arrayfun (cpu vs gpu)

244 visualizzazioni (ultimi 30 giorni)
Alessandro
Alessandro il 3 Gen 2026 alle 16:22
Commentato: Matt J il 7 Gen 2026 alle 3:57
I find very convenient that Matlab allows for implicit expansion since the 2016 version (for an explanation, see this nice article: https://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/?s_tid=blogs_rc_1).
I was then puzzled to discover that arrayfun on the cpu does not allow for it, while arrayfun when called with gpu arrays does allow for implicit expansion. Below is a MWE to demonstrate this behavior.
Let me quickly explain it: if I have two vectors x with size [2,1] and y with size [1,2], I can calculate the sum x+y and get a matrix 2*2 as intended. This is better than the ugly and more memory-intensive
repmat(x,1,2)+repmat(y,2,1)
Unfortunately this does not work with arrayfun on the cpu!
Since I code both using normal arrays and GPU arrays, I find this different behavior of arrayfun quite misleading. It would be great if Matlab could allow implicit expansion also on arrayfun cpu. When I have large arrays, duplicating dimensions with repmat takes a lot of memory.
%% Demonstration of implicit expansion support in MATLAB and arrayfun
% This script shows that:
% 1) MATLAB supports implicit expansion for standard array operations.
% 2) arrayfun on the GPU supports implicit expansion.
% 3) arrayfun on the CPU does NOT support implicit expansion!!
%
% Implicit expansion allows a 2×1 vector to be added to a 1×2 vector,
% producing a 2×2 matrix.
clear; clc; close all;
% Define test vectors
x = [1; 2]; % Column vector (2×1)
y = [1, 2]; % Row vector (1×2)
%% Implicit expansion using standard MATLAB operations
F1 = myadd(x, y);
%% Implicit expansion using arrayfun on the GPU
F2 = arrayfun(@myadd, gpuArray(x), gpuArray(y));
%% Attempt implicit expansion using arrayfun on the CPU (expected to fail)
try
F3 = arrayfun(@myadd, x, y);
catch ME
fprintf('CPU arrayfun error (expected):\n%s\n\n', ME.message);
end
%% Function myadd
function F = myadd(x, y)
% Element-wise addition
F = x + y;
end
  10 Commenti
Joss Knight
Joss Knight il 5 Gen 2026 alle 9:42
Spostato: Matt J il 5 Gen 2026 alle 12:40
arrayfun with expansion, particularly for expanding scalars, is certainly very convenient syntactic sugar for a for loop to make code more compact and readable, for instance, setting a bunch of settings on an object array, where the for loop is not going to be optimized so there is no advantage to it.
layers = arrayfun(@setHyperParams, layers, 0, [layers.L2Factor]); % Freeze learn rate
It's just easier to read, isn't it? Obviously there are other ways to do this particular operation on one line but I certainly see your point. However, the others have good points; arrayfun is almost always slower than a for loop or alternative approach, so taking action to encourage its use is something to do with caution.
Paul
Paul il 5 Gen 2026 alle 12:30
Spostato: Matt J il 5 Gen 2026 alle 12:40
Hi Joss,
Any idea why the gpu version of arrayfun supports expansion (whereas the cpu version does not)?

Accedi per commentare.

Risposta accettata

Matt J
Matt J il 4 Gen 2026 alle 1:03
Modificato: Matt J il 4 Gen 2026 alle 20:22
Remember that arrayfun (on the CPU) is nothing more than an M-Code for-loop in disguise. Therefore, it is not hard to write your own arrayfun version -- one which supports implicit expansion -- using for-loops.
The implementation below is slower in speed to explicit expansion (using repmat), but of course conserves a lot more memory. You would probably need very significant data sizes and/or RAM limits before implicit expansion is faster.
% Define test vectors
N=100;
x = rand(N,1); % Column vector
y = rand(1,N); % Row vector
z = rand(1,1,N);
myadd=@(a,b,c) a+b+c;
tic;
out1=arrayfunImplExp(myadd,x,y,z); %implicit expansion
toc
Elapsed time is 0.898685 seconds.
tic;
out2=arrayfunExplExp(myadd,x,y,z); %explicit expansion (with repmat)
toc
Elapsed time is 0.548641 seconds.
isequal(out1,x+y+z)
ans = logical
1
isequal(out2,x+y+z)
ans = logical
1
function out = arrayfunImplExp(fun, varargin)
%ARRAYFUNIMPLEXP arrayfun with implicit expansion (CPU)
%
% NOTE:
% Useful for CPU-only execution. On the GPU, use arrayfun instead,
% which implements its own implicit expansion.
% Number of inputs and maximum dimensionality
numArgs = numel(varargin);
numDims = max(cellfun(@ndims, varargin));
% Collect per-input sizes (row-wise)
sizes = cellfun(@(c) size(c, 1:numDims), varargin, 'UniformOutput', false);
sizes = vertcat(sizes{:}); % [numArgs x numDims]
% Output size is max along each dimension
outSize = max(sizes, [], 1);
% Precompute row-wise strides for linear indexing
strides = [ ...
ones(numArgs, 1), ...
cumprod(sizes(:, 1:end-1), 2) ...
];
% Convert sizes to zero-based limits
sizes = sizes - 1;
% Allocate output
out = nan(outSize);
argsIdx=nan(numArgs,1);
args=cell(1,numArgs);
A=1:numArgs;
% Main loop over output elements
for linIdx = 1:numel(out)
% Convert linear index → zero-based subscripts
idx = linIdx - 1;
subs = zeros(1, numDims);
for d = 1:numDims
sd = outSize(d);
subs(d) = mod(idx, sd);
idx = floor(idx / sd);
end
% Apply implicit expansion masking
Subs = min(subs,sizes);
% Row-wise sub2ind
argsIdxNew = sum(Subs .* strides, 2) + 1;
map=(argsIdxNew ~= argsIdx);
% Gather arguments
% v=varargin(map);
% k=argsIdxNew(map);
% c=0;
% for j=map
% c=c+1;
% args{j} = v{j}(k(c));
% end
for j = 1:numArgs
if map(j)
args{j} = varargin{j}(argsIdxNew(j));
end
end
argsIdx=argsIdxNew;
% Evaluate function
out(linIdx) = fun(args{:});
end
end
function out = arrayfunExplExp(fun, varargin)
%ARRAYFUNEXPLEXP arrayfun with explicit expansion using repmat
% Provided for comparison with arrayfunImplExp.
numArgs = numel(varargin);
numDims = max(cellfun(@ndims, varargin));
% Collect sizes
sizes = cellfun(@(c) size(c, 1:numDims), varargin, 'UniformOutput', false);
sizes = vertcat(sizes{:});
% Output size
outSize = max(sizes, [], 1);
numOut = prod(outSize);
% Preallocate expanded argument cell array
args = cell(numArgs, numOut);
% Explicit expansion
for k = 1:numArgs
v = varargin{k};
vSz = size(v, 1:numDims);
reps = outSize ./ vSz;
% Expand and linearize
vk = repmat(num2cell(v), reps);
args(k, :) = vk(:).';
end
% Allocate output
out = nan(outSize);
% Evaluate function
for j = 1:numOut
out(j) = fun(args{:, j});
end
end
  17 Commenti
Alessandro
Alessandro il 6 Gen 2026 alle 16:44

Thanks for the explanation. But using parfor with threads instead of processes should reduce the overhead considerably. I will try later if I have time.

Matt J
Matt J il 7 Gen 2026 alle 3:57
I made some improvements to my File Exchange submission, which now includes two versions, one which is RAM-conserving (arrayfunLowMem) and one which is liberal with RAM, but somewhat faster (arrayfunHighMem). Both are significantly faster than native arrayfun (revised timeComparisons.m script attached).
1) arrayfun on GPU
2) Vectorized on CPU
3) Nested loops on CPU
4a) User-written arrayfunLowMem on CPU
4b) User-written arrayfunHighMem on CPU
5) Matlab arrayfun on CPU
Maximum absolute error vs baseline (F2):
gpuArray.arrayfun (F1) : 7.10543e-15
Vectorized CPU (F2) : 0
Nested loops CPU (F3) : 0
arrayfunLowMem (F4a): 0
arrayfunHighMem (F4b): 0
Matlab arrayfun CPU (F5) : 0
Timing summary (seconds):
1) arrayfun GPU : 0.010418
2) Vectorized CPU : 0.097721
3) Nested loops CPU : 0.329859
4a) arrayfunLowMem : 3.806847
4b) arrayfunHighMem : 3.187784
5) arrayfun Matlab : 5.101973

Accedi per commentare.

Più risposte (3)

Torsten
Torsten il 3 Gen 2026 alle 18:05
Modificato: Torsten il 3 Gen 2026 alle 18:29
F3 = cellfun(@myadd, [{x}], [{y}], 'UniformOutput', false)
will work.
Arrayfun tries to apply "myadd" to [first element of x,first element of y] and [second element of x, second element of y]. Thus x and y must be of the same size and - even if it would work - the result would be [2, 4] or [2;4].
I don't understand why it works for gpuArray and gives the result you want. Maybe input arrays of different size are automatically interpreted here as two separate single objects to which "myadd" is to be applied - as it is done if you use cell arrays together with "cellfun".
  4 Commenti
Alessandro
Alessandro il 3 Gen 2026 alle 21:36
@Paul: I agree with you. Ideally Matlab should implement implicit expansion also on arrayfun cpu, as it is implemented on the gpu version. This improvement would also not break any code.
Torsten
Torsten il 3 Gen 2026 alle 22:57
Modificato: Torsten il 3 Gen 2026 alle 22:57
If your contribution rather was a complaint and not a question, make a feature request:

Accedi per commentare.


Matt J
Matt J il 3 Gen 2026 alle 23:25
Modificato: Matt J il 4 Gen 2026 alle 3:18
I agree it is confusing, but the gpuArray version of arrayfun was never intended as a direct analgoue of the CPU version. Additionally, there are all kinds of other instances where the gpuArray support for a function differs in capabilities from its CPU version. Usually, though, the GPU version is less flexible, not moreso, as seems to be the case with arrayfun.
A more appropriate comparison of implicit expansion between CPU vs GPU (for binary functions) would probably be to use bsxfun instead:
% Define test vectors
x = rand(10000,1); % Column vector
y = rand(1,10000); % Row vector
xg=gpuArray(x);
yg=gpuArray(y);
myadd=@(a,b) a+b;
timeit( @() bsxfun(myadd, x, y) ) %CPU
ans =
0.3355
gputimeit( @() bsxfun(myadd, xg, yg) ) %GPU
ans =
0.0078

Joss Knight
Joss Knight il 5 Gen 2026 alle 13:15
Spostato: Matt J il 5 Gen 2026 alle 13:40
Well, there are a couple of answers to that. Firstly and probably most importantly, GPU arrayfun is just an extension of gpuArray's existing compiler for element-wise operations, all of which support dimension expansion; it's also a natural strided memory access pattern to support for a GPU kernel, where each input has its own stride, and there is native support for this kind of memory access in GPU hardware. Secondly, it makes design sense and only isn't implemented for CPU for the reasons given. The historical explanation is that the idea of broadcast operations didn't even exist back when arrayfun was first created for the CPU, but it did exist by the time GPU arrayfun came along.
  2 Commenti
Matt J
Matt J il 5 Gen 2026 alle 13:23
Modificato: Matt J il 5 Gen 2026 alle 13:46
Secondly, it makes design sense and only isn't implemented for CPU for the reasons given.
Given where? In your initial post?
And are those reasons still an argument for not pursuing it now, or is it just because of historical legacy?
Joss Knight
Joss Knight il 6 Gen 2026 alle 10:00
The reasons given by various people. Should you really enhance something you want to discourage people from using?

Accedi per commentare.

Categorie

Scopri di più su Matrices and Arrays in Help Center e File Exchange

Prodotti


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by