Reducing a cell array of tables to a single table

97 views (last 30 days)
Ingo Marquart
Ingo Marquart on 25 Jun 2019
Commented: David Kelly on 18 Aug 2020
I am using a one-dimensional cell array to save a set of tables.
The necessity for this arises from using a parfor loop in the main part of the program, where each i'th output is a table of results, and outputs must be generated in parallel. I would like to save everything in one table, but the order must be preserved. Since parfor restricts indexing, the best way I have found is to create said cell array, and afterwards looping through it. Since each table corresponds to a single index, Matlab happily accepts this indexing in the parallel loop.
Each iteration returns a table of length maxT with some amount of columns that I determine dynamically. I basically pre-allocate then the table mainTable and loop over my cell array to fill it. To set the correct indecies, I use a vector called asdf, which tells me which rows of mainTable should belong to a given iteration i (there's other ways to do this, this just came out of trying to make parfor work). If that seems confusing, just think of me looping through the cell array, and appending the table in cell i onto mainTable.
The issue is now that the second loop becomes rather slow, because it is not parallelized. Although the main work happens in the first parfor loop and therefore the current solution is still better than without parfor, I would very much like to make the reduction to a single table fast.
Even though I know the position of each table within mainTable (e.g. see variable "asdf"), I can not index with such slices in a parfor loop. The code below, which does this without parfor, works.
Some things which do not work:
cell2table(resultCell) gives a table of tables. No join or union on this is successful
resultCell{:} theoretically gives a list of all tables, but using [resultCell{:}] gives an error because of column duplication. Otherwise only the first table is extracted.
I did not find a way to parallelize the assignment to mainTable, because I always need to slice from a starting point to and ending point.
Any ideas?
parfor i=1:NrSims
%% Do something
% resulttable is a table of length maxT
resultCell{i}=resultTable;
end
%% Create main table
% Here I preallocate mainTable etc.
(...)
% Next, I create this index vector which allows me to slice mainTable for each i
asdf=kron(1:NrSims,ones(maxT,1)')';
for i=1:NrSims
slice=(asdf==i);
mainTable(slice, :) = resultCell{i};
end
  4 Comments
Ingo Marquart
Ingo Marquart on 25 Jun 2019
Thank you. No, this is just during my messing around with the code. Normally, my variables follow a strict naming scheme.

Sign in to comment.

Accepted Answer

Stephen
Stephen on 25 Jun 2019
Edited: Stephen on 25 Jun 2019
  2 Comments
David Kelly
David Kelly on 18 Aug 2020
Stephen,
I just wanted to say thanks for contributing and solving so many to all these questions on the Malab answers/
The amount of times you have saved me is unbelievable!
Cheers
David

Sign in to comment.

More Answers (1)

Campion Loong
Campion Loong on 27 Jun 2019
Hi Ingo,
Glad you've found a solution. In case it maybe useful in your workflow, I'd like to mention the various Datastores available to you in base MATLAB:
PARFOR support is builtin via partition, so you don't need to explicitly manage the chunking and remerge. It also lets you scale out to other resources like clusters more easily.
Hope this helps.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by