Matlab Table / Dataset type optimization

Question

Victor il 27 Giu 2017

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/346434-matlab-table-dataset-type-optimization

Commentato: dpb il 29 Giu 2017

I am searching some optimized datatypes for "observations-variables" table in Matlab, that can be fast and easily accessed by columns (through variables) and by rows (through observations).

Here is сomparison of existing Matlab datatypes:

Matrix is very fast, hovewer, it has no built-in indexing labels/enumerations for its dimensions, and you can't always remember variable name by column index.
Table has very bad performance, especially when reading individual rows/columns in a for loop (I suppose it runs some slow convertion methods, and is designed to be more Excel-like).
Scalar structure (structure of column arrays) datatype - fast column-wise access to variables as vectors, but slow row-wise conversion to observations.
Nonscalar structure (array of structures) - fast row-wise access to observations as vectors, but slow column-wise conversion to variables.I wonder if I can use some simpler and optimized version of Table data type, if I want just to combine row-number and column-variable indexing with only numerical variables -OR- any variable type.

See the same question on Stack Overflow.

--

Results of test script:

----
TEST1 - reading individual observations
Matrix: 0.072519 sec
Table: 18.014 sec
Array of structures: 0.49896 sec
Structure of arrays: 4.3865 sec
----
TEST2 - reading individual variables
Matrix: 0.0047834 sec
Table: 0.0017972 sec
Array of structures: 2.2715 sec
Structure of arrays: 0.0010529 sec

Test script:

Nobs = 1e5; % number of observations-rows
varNames={'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O'};
Nvar = numel(varNames); % number of variables-colums
M = randn(Nobs, Nvar); % matrix
T = array2table(M, 'VariableNames', varNames); % table
NS = struct; % nonscalar structure = array of structures
for i=1:Nobs
    for v=1:Nvar
        NS(i).(varNames{v}) = M(i,v);
    end
end
SS = struct; % scalar structure = structure of arrays
for v=1:Nvar
    SS.(varNames{v}) = M(:,v);
end
%%TEST 1 - reading individual observations (row-wise)
disp('----'); disp('TEST1 - reading individual observations');
tic; % matrix
for i=1:Nobs
   x = M(i,:); end
disp(['Matrix: ', num2str(toc()), ' sec']);
tic; % table
for i=1:Nobs
   x = T(i,:); end
disp(['Table: ', num2str(toc), ' sec']);
tic;% nonscalar structure = array of structures
for i=1:Nobs
    x = NS(i); end
disp(['Array of structures: ', num2str(toc()), ' sec']);
tic;% scalar structure = structure of arrays 
for i=1:Nobs
    for v=1:Nvar
        x.(varNames{v}) = SS.(varNames{v})(i);
    end
end
disp(['Structure of arrays: ', num2str(toc()), ' sec']);
%%TEST 2 - reading individual variables (column-wise)
disp('----'); disp('TEST2 - reading individual variables');
tic; % matrix
for v=1:Nvar
   x = M(:,v); end
disp(['Matrix: ', num2str(toc()), ' sec']);
tic; % table
for v=1:Nvar
   x = T.(varNames{v}); end
disp(['Table: ', num2str(toc()), ' sec']);
tic; % nonscalar structure = array of structures
for v=1:Nvar
    for i=1:Nobs
        x(i,1) = NS(i).(varNames{v});
    end
end
disp(['Array of structures: ', num2str(toc()), ' sec']);
tic; % scalar structure = structure of arrays
for v=1:Nvar
    x = SS.(varNames{v}); end
disp(['Structure of arrays: ', num2str(toc()), ' sec']);

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

dpb il 27 Giu 2017

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/346434-matlab-table-dataset-type-optimization#answer_272106

Apri in MATLAB Online

"Matrix is very fast, [but] you can't always remember variable name by column index."

If speed is the goal, you can't beat native data types. It is unfortunate that the table data class does have such a high performance hit; it is extremely handy in many ways but just isn't up to handling large datasets as of yet, anyway. We can only hope TMW can/will improve the implementation at the moment, however.

For arrays, the identification problem can be improved, however--define index variables for mnemonic use...

M = randn(Nobs, Nvar); % matrix
date=1;                % 1st column is date/time datetime, say
accel=2;               % 2nd is acceleration
...
plot(M(:,date),M(:,accel)) % plot acceleration vs time...

By row, consider using a categorical indexing vector that correlates with the row. It's also not too difficult to write searching logic; often these can be packaged as anonymous functions for specific types of searches. Just what works best for a given instance depends on the kind of search one needs.

The main thing to do to try to minimize performance issues will be to remove the loop and select the data to operate over by logical addressing and use vector operations on that subset. With table use addressing modes that return the underlying data type rather than another table will help.

In short, "there is no free lunch"; the ability to handle disparate data types in a higher-level abstraction is going to cost in additional overhead.

A specific problem trying to solve rather than general timings as the above, while illustrative that there is a difference in those operations doesn't really get to the heart of the actual problem to be solved.

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

dpb il 27 Giu 2017

It would help to see a specific dataset and how it's being acquired for actual specific code but it would seem that the above mnemonic naming scheme could be made general and dynamic for any particular collection setup. All it would need would be the relationship of which is which which you would need to make up the table column labels anyway.

I can't think but there would also be ways to make the table more effective than looping or make the loop occur at a higher level perhaps while the data itself is treated at the native level.

dpb il 28 Giu 2017

" I can't remember each parameter name of a particular dataset by its index - I only remember their meaningful names."

You do recognize that in implementing the idea of the aforementioned named variables as stand-ins for column indices that categorical variables are, underneath, simply integers and that the values can be used as indices, right? The value/name correspondence is set when created so can make any arbitrary matching by position as needed.

Accedi per commentare.

Answer 2

Peter Perkins il 29 Giu 2017

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/346434-matlab-table-dataset-type-optimization#answer_272399

Modificato: Peter Perkins il 29 Giu 2017

Victor, you are stating your problem as if you must always use either one or the other. As already discussed on a couple of other threads ( [1] [2] ) that you've contributed to, it very often possible to use numeric matrices in time-critical portions of the code, but still get the subscripting benefits of tables outside of tight loops.

Vectorizing the operations is, of course, the first thing you should try, but that isn't always possible.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

dpb il 29 Giu 2017

Good to emphasize that explicitly, Peter!

Accedi per commentare.

Matlab Table / Dataset type optimization

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (2)

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

Matlab Table / Dataset type optimization

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (2)

3 Commenti Mostra 1 commento meno recenteNascondi 1 commento meno recente

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

3 Commenti
Mostra 1 commento meno recenteNascondi 1 commento meno recente

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti