When is it better to use a multi-level-struct than a table?
51 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
cdlapoin
il 23 Ott 2023
Modificato: Stephen23
il 24 Ott 2023
I am processing data logged in ~4000 text files. I initally read the data into a multi-level structure because the heirarchical nature seemed to make more sense for how I collected the data. 5 different configurations, each tested at 20 different positions, with each position containing 40 angles, each angle being a seperate experiment with environmental parameters (time, temp, speed), and 42 data channels, each having a mean or RMS value, a tare value, and a standard deviation (I can calculate these as I read the files and then store only scalars in the struct).
I abandoned the struct because reading the data back out was too burdensome. For instance, if I want to plot data channel 5 mean against data channel 1 mean for a certain angle (say 10deg) at all locations of one config, I thought I would use something like:
% pseudo code just for illustration, haven't tried, wouldn't work
x = [data.config(3).pos(:).ang(10).chan(5).mean];
y = [data.config(3).pos(:).ang(10).chan(1).mean];
plot(x,y)
But what I learned is that you cannot address more than one level of a struct at a time, instead, you must run a series of nested loops, one each for every criteria you want to query by, and move it's contents into a temporary variable for the next loop to operate on.
With a table on the other hand, I can store everything in one large flat table where each row is a an angle (one row for each experiment) and just have a ton of columns. The downside to this in my mind is that the table now contains soooo much more repetative data. for instance: the struct could parent all of the sub structs back to one of the five configurations, but the table must have ~4000 extra cells so that each row knows what config it is a member of. The upside is that querying out data is much simpler. eg:
% also example code which I haven't tried, may not be correct
x = data.mean(config==3 & ang==10 & chan==5);
y = data.mean(config==3 & ang==10 & chan==1);
plot(x,y)
So I am guessing it is a matter of preference, but going through all of this is making me wonder when and why do you chose a mutli-level struct over a table, and are there other even better options?
6 Commenti
Stephen23
il 23 Ott 2023
Modificato: Stephen23
il 23 Ott 2023
"If you have S(J).A(K).B(L) and you are doing sweeps over J K L..."
What relevance does that have to the specific example give by the OP? Not much.
"there would be another arrayfun version that iterates over structure members that just isn't coming to mind at the moment but I am sure is possible"
It is possible if you pass scalar structures as the function inputs. But warning: tectonic plates move much faster.
(hint: that approach is the partner to version 1, just like version 3 is the partner to version 2)
"So getfield() is one of the options that does not require creating temporary variables (other than internally)"
And yet... it is not really an option. None of those "versions" actually deliver what the OP requires: the numeric vectors x and y (for plotting, as the OP clearly states).
Versions 1 & 2 are the nested loops the OP already knows about. Version 3 (very slowly) creates nested cell arrays inside nested cell arrays inside another cell array. Flattening multiply nested cell arrays (to get the numeric vectors x & y, which are what the OP needs) requires either multiple comma-separated lists (with associated temporary variables) or more nested loops or recursion... or some other even worse kind of horror. So you are right back to square one.
@cdlapoin: these examples should make it quite clear why you should be using tables.
Risposta accettata
Walter Roberson
il 23 Ott 2023
Modificato: Walter Roberson
il 23 Ott 2023
We are discussing in https://www.mathworks.com/matlabcentral/answers/556024-what-frustrates-you-about-matlab-2#answer_1337061 why row-by-row access to a table can be much slower than some of the alternatives. A lot is going to depend on how you use the data after it has been put into the data structure.
If all of the data is numeric, using a numeric array will be typically be fastest... but again it depends on the data access patterns. Sometimes cell arrays are faster, as recently explored in https://www.mathworks.com/matlabcentral/answers/2035921-access-time-of-data-in-cell-array-vs-matrix#answer_1336881
2 Commenti
Stephen23
il 24 Ott 2023
Modificato: Stephen23
il 24 Ott 2023
Use tables.
Most likely you will spend far more time writing, debugging, and maintaining your code than your code will spend running. Therefore making sure that your data and code is clear and correct is of the uttmost importance, and will save you time overall. Tables are a great way to achive that clarity.
"I'm not really hearing that there is ever a time where the nested data structures would be the better option."
Something like this would be difficult without nested structures or a similar data type:
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Structures in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!