Efficient access and manipulation of arrays in nested cells
10 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
I have nested cells in the form mycell{i}{j,k} with arrays in each of those. I have not found examples that work to perform operations like getting the stats (e.g., max) of all the arrays without a loop to return something like cellstat(i,j,k). Another example is that I'm performing a fit with each array and it would be nice to gather all of one of the goodness stats into a single dimension array or to take stats of a goodness stat across i so I can see it at each j,k.
I think with an example of each of those, I could figure out anything else that comes up. Thanks!
**********************
Adding an example:
data = rand(2e5,1); % one data set, I have many
datay = rand(2e5,1); % y-coordinate of the data
dataz = rand(2e5,1); % z-coordinate of the data
The first task with this data, is to create a grid of y,z pairs and sort each data set into those. Since rand is [0,1], say the grid is every 0.1. This only has to be done once, but I suppose how the data are stored could affect the speed of future steps.
After that, I'm doing a windowed fit on the points that are sorted into each y,z bin for each dataset. There may be some trial and error here, and, while I can test on subsets, it would be helpful if the data are structured in a way that makes the fitting routine as fast as possible. Would any more information be useful?
8 Commenti
dpb
il 2 Apr 2025
OK, I let the "grid" and the initial stucture stuff confuse me...@Voss got back before I did and answered the basics; as he points out, there's no reason to create excessively complex storage structures; use the data you have the way it comes. I'd still be looking into how the data are initially created and what are the multiple cases for further consolidation, but if there really are 10E5 points per dataset, it's probably not a practical thing to actually combine until summarize results.
N=10;
edges=linspace(0,1,NY+1);
iyz=discretize([datay dataz],edges);
does compared to histcounts2. It returns the indices by column in one output array and uses the same binning in both directions so isn't quite as flexible but it might be a little faster, although given the tasks so far, I don't see performance as being a big issue if you don't make things more difficult than need be... :J>
Risposta accettata
Voss
il 1 Apr 2025
data = rand(2e5,1); % one data set, I have many
datay = rand(2e5,1); % y-coordinate of the data
dataz = rand(2e5,1); % z-coordinate of the data
"The first task with this data, is to create a grid of y,z pairs and sort each data set into those. Since rand is [0,1], say the grid is every 0.1.... how the data are stored could affect the speed of future steps"
Store the bin index of each data point, so you know what bin each data point belongs to. (It's not necessary to make a new copy of the data with a different structure.)
NY = 10;
NZ = 10;
yedges = linspace(0,1,NY+1);
zedges = linspace(0,1,NZ+1);
[~,~,~,yidx,zidx] = histcounts2(datay,dataz,yedges,zedges);
"After that, I'm doing a windowed fit on the points that are sorted into each y,z bin for each dataset."
Maybe something like as follows. groupsummary uses the bin indices found in the previous step:
function out = your_fit_function(d,y,z)
[f,gof] = fit([y,z],d,'poly11');
out = {{f,gof}};
end
[C,BG] = groupsummary({data,datay,dataz},[zidx,yidx],@your_fit_function);
Now you have an sfit object and goodness-of-fit struct, returned from fit, for each grid cell:
C{1}
C{1}{:}
And you can do what you want with those:
for ii = 1:3%numel(C)
fprintf(1,'region %0.1f<y<%0.1f, %0.1f<z<%0.1f:\n\n', ...
yedges(BG{2}(ii)),yedges(BG{2}(ii)+1),zedges(BG{1}(ii)),zedges(BG{1}(ii)+1));
fprintf(1,' fit object:\n');
disp(C{ii}{1})
fprintf(1,' goodness:\n');
disp(C{ii}{2})
fprintf(1,' \n');
end
0 Commenti
Più risposte (3)
Walter Roberson
il 31 Mar 2025
Example:
function gof = getgof(PAGE)
[~, gof] = fit(PAGE somehow);
end
gof_stats = cellfun(@getgof, mycell, 'uniform', 0);
gof_stats = vertcat(gof_stats{:});
0 Commenti
Matt J
il 1 Apr 2025
Modificato: Matt J
il 1 Apr 2025
There is no way to iterate over cells (nested or otherwise) without a loop, or something equivent in performance to a loop (cellfun, arrayfun, cell2mat, etc...).
4 Commenti
Matt J
il 1 Apr 2025
Modificato: Matt J
il 1 Apr 2025
Can you give an example without a loop, e.g., cellfun?
How would an example of cellfun help you? You said you are looking for something more efficient than a loop, and as I have said, nothing is more efficient than a loop when dealing with cell arrays.
dpb
il 1 Apr 2025
Modificato: dpb
il 1 Apr 2025
To amplify on @Matt J's comment; at its heart all the cell-, array-, struct- functions are looping constructs internally that are "syntactic sugar" in replacing the for ... end loop with the single source code line. But, the performance of these cannot exceed that of JIT-compiled looping code and given that they have not been subject to all the optimizations Mathworks has made to for loops over the years including multi-threading, they all will be at least some slower than a "deadahead" for loop.
Functionally, a cellfun is a wrapper for an arrayfun -- it passes the derferenced cell to the function instead; you could construct the same with arrayfun if you did the dereferencing in the argument list for it. See this <recent post> for a general discussion and some pertinent remarks from TMW Staff members on differences.
MORAL: Do NOT assume that fewer lines of source code equate to faster execution speed.
dpb
il 1 Apr 2025
Modificato: dpb
il 1 Apr 2025
The other alternative to investigate is to turn the metadata you're segregating/tracking by cell indices into real data in a flat table or array. Ideally, those would be recognizable things like test number, date, whatever..., but they could for starters just be the indices. Then the power of <grouping variables> and or grpstats and/or varfun could be brought to bear on the problem. Large datasets can be dealt with tall arrays and/or memory mapping...findgroups
4 Commenti
Walter Roberson
il 1 Apr 2025
I believe I could reorganize the data into a table
Accessing a range of table rows is notably less efficient than accessing a range of rows of a numeric array.
dpb
il 1 Apr 2025
Modificato: dpb
il 1 Apr 2025
"... turn the metadata you're segregating/tracking by cell indices into real data in a flat table or array." (emphasis added...dpb)
The table is awfully convenient for display and is generally "fast enough" ...but, agreed, findgroups and splitapply to do the calculations will be faster on an array than will be varfun or grpstats on a table.
I was interpreting the Q? about speed as including the existing cell array structure as well, not just the comparison of an array to a table. Dereferencing a cell itself is generally quick, but by the time one calls cellfun() a number of times and then has to reconstruct/collect the results, who knows how it might compare?
But, it's pretty tough to attack @Dan Houck's real problem without an example to poke at...others may be able to write air code that might be applicable to his actual situation, but I'm not that clairvoyant and as @John D'Errico was complaining the other day, the Crystal Ball TB is notably dark these days.
Vedere anche
Categorie
Scopri di più su Text Data Preparation in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!