Parfor loops indexing into table rows

36 visualizzazioni (ultimi 30 giorni)
Typically the most time-consuming part of my data analysis can be boiled down to "do thing to row of table for all rows of table", so it seemed pretty ideal for parfor looping (and is) but I'm wondering if there is a better way than the workaround I've been using.
Indexing seems problematic - my usual approach to table indexing is table.columnname(row) but this leads to an error: "Error: Unable to classify the variable 'tableParfor' in the body of the parfor-loop. For more information, see Parallel for Loops in MATLAB, "Solve Variable Classification Issues in parfor-Loops"."
The same thing happens if I try table{row, columnname}, and as far as I can tell from the docs on tables I'm kinda out of options for normal indexing at this point.
I assumed my usual approach failed because this page says that indexing in the form of a.b(c) fails:
Variable A on the left is not sliced; variable A on the right is sliced:
A.q(i,12) A(i,12).q
But the right side indexing is not valid for tables. I'm not really sure why table{row, column} doesn't work. But I did find a workaround (making a temporary one-row table and always indexing into that) that does work but seems suboptimal. Still cuts down on time for a lot of my scripts but I still think
If anyone can shed some light or improve this code, I've made a simplified version of what my actual scripts generally do with parfor loops.
tableParfor = table('Size', [100 4], 'VariableTypes', {'double', 'double', 'double', 'double'}, 'VariableNames', {'first', 'second', 'third', 'final'});
for rows = 1:100
for columns = 1:3
tableParfor.(columns)(rows) = rand(1);
end
end
a=1.5;
b=2.6;
c=6.4;
%random broadcast variables
parfor cT = 1:height(tableParfor)
% tableParfor.final(cT)=a*tableParfor.first(cT) + b*tableParfor.second(cT) + c*tableParfor.third(cT);
% my usual syntax, this doesn't work with parfor
% tableParfor{cT, 'final'}=a*tableParfor{cT, 'first'} + b*tableParfor{cT, 'second'} + c*tableParfor{cT, 'third'};
% alternative syntax, this doesn't work with parfor
% tableParfor(cT).final=a*tableParfor(cT).first + b*tableParfor(cT).second + c*tableParfor(cT).third;
% my attempt to get something like what the docs recommend, but is invalid syntax for tables
rowTable = tableParfor(cT, :);
rowTable.final = a*rowTable.first + b*rowTable.second + c*rowTable.third;
tableParfor(cT, :) = rowTable;
% this workaround works, but adds two extra lines to the code and I think the extra creation of rowTable for each worker chews up memory
end

Risposta accettata

Edric Ellis
Edric Ellis il 20 Lug 2022
There's a few things conspiring against you here. Firstly, parfor analysis doesn't understand how to "slice" table data using variable names, but you can use variable indices, i.e. tableParfor{cT,4} = ... is allowed.
Secondly, you're trying to use tableParfor as a "sliced input/output", which further constrains what you're allowed to do - in particular the "fixed form of indexing" constraint stops you accessing different variables of your sliced row directly.
Your workaround (extract a slice, operate, put it back) would be my first choice, despite its awkwardness. The following is almost certainly a worse option since it duplicates and then broadcasts the input data table, but it does work:
inTable = tableParfor;
parfor cT = 1:height(tableParfor)
tableParfor{cT, 4}=a*inTable{cT, 'first'} + b*inTable{cT, 'second'} + c*inTable{cT, 'third'};
end
Note that in that example, inTable gets broadcast, and so all indexing restrictions are removed, and I can use the variable-name indexing.
  4 Commenti
Andrew McCauley
Andrew McCauley il 20 Lug 2022
Each row will at least have an entry in a column that holds cells of a vector of event timings, often around 1000x1. Often each row will have a vector of the rate of event timings over time, which can be as much as 200000x1 if I don't downsample it, but at least some thousands x 1 even if I do. 1673 bytes per row overhead is not really a concern.
Bruno Luong
Bruno Luong il 20 Lug 2022
OK I see now that looks big.

Accedi per commentare.

Più risposte (1)

Bruno Luong
Bruno Luong il 20 Lug 2022
Modificato: Bruno Luong il 20 Lug 2022
This works, but I'm not sure is what you want.
IMO table is not well-suited data structure to do calculation. Simple raw numerical array is.
EDIT corrrect typos
tableParfor = table('Size', [100 4], 'VariableTypes', {'double', 'double', 'double', 'double'}, 'VariableNames', {'first', 'second', 'third', 'final'});
for rows = 1:100
for columns = 1:3
tableParfor.(columns)(rows) = rand(1);
end
end
a=1.5;
b=2.6;
c=6.4;
%random broadcast variables
for cT = 1:height(tableParfor)
rowTable = tableParfor{cT, :};
rowTable(4) = a*rowTable(1) + b*rowTable(2) + c*rowTable(3);
tableParfor(cT,:) = num2cell(rowTable);
% this workaround works, but adds two extra lines to the code and I think the extra creation of rowTable for each worker chews up memory
end
  2 Commenti
Andrew McCauley
Andrew McCauley il 20 Lug 2022
Modificato: Andrew McCauley il 20 Lug 2022
Thanks Bruno - I don't really see how that improves my existing workaround, and I think you've made a typo with "rowFinal(4) = a*rowTable(1) + b*rowTable(2) + c*rowTable(3);", your solution replaces the existing columns with zeros (which would be bad), and also I assume you mean that tables are not well-suited to calculation.
I agree, but my data requires each row to have strings, doubles and cells (among other things, for names of cells recorded from, some constant related to the recording, and raw data traces respectively), so raw numerical array is not possible - cell array works fine, but I don't think would be any better and is much more cumbersome for indexing.
Typically within a loop, I do curve fits, several functions etc etc all on the contents of one row of a table (which may be doubles or cells), and if it's time-consuming enough to justify the overhead, I'll turn that into a parfor loop. Basically, I'm hoping I can find a way to not have to create a temporary row and just slice into the row of the table I'm operating on for individual elements.
Bruno Luong
Bruno Luong il 20 Lug 2022
The table is a beast of OOP class with all kinds of overloaed indexing. You are already lucky to be able to allow to extract rows in parallel as slice data.

Accedi per commentare.

Categorie

Scopri di più su Parallel for-Loops (parfor) in Help Center e File Exchange

Prodotti


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by