Loop through the unique values of a very large column and extract data

3 visualizzazioni (ultimi 30 giorni)

Mostra commenti meno recenti

Julian Williams il 15 Giu 2020

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/548415-loop-through-the-unique-values-of-a-very-large-column-and-extract-data

Commentato: Julian Williams il 16 Giu 2020

Apri in MATLAB Online

This is more a speed question than a "how to" question.

Assume I have the following problem:

Three variables A, B and C.

A is a series of IDs and B and C are data (e.g. dates and a measurement).

For various reasons I want to seperate the data, so instead of three columns I have a structure with something like:

mystruct.First_ID_FROM_A = [B(indexFirstID,:) C(indexFirstID,:)]

Traditionally I just do the following:

[uA,IA,IB] = unique(A);
for i=1:length(uA)
    ii = find(i==IB);
    mystruct.(uA{i,1}) = [B(ii,:) C(ii,:)];
    %sometimes I do other stuff here with some cross referencing so the index ii is useful.
end

Job done. I have tried other methods, but this is pretty fast, except now I have like crazy big data (e.g. A, B and C is like the best part of a billion rows). So this is my second attempt that I run on a server:

[uA,IA,IB] = unique(A);
N = length(uA);
temp = cell(N,1);
% do the indexing with a cell structure that can be cut.
parfor i=1:N
    ii = find(i==IB);
    temp{i,1} = [B(ii,:) C(ii,:)];
end
% do a second loop just to reallocate the data
for i=1:N
    mystruct.(uA{i,1}) = temp{i,1};  
end

So despite being two loops this can be quicker as the extraction is in parallel and the assignment is fast.

Is there a fancy way of using something like an array based version of a binary expansion function that can do this faster without the loop, in either step of the second process? Or should I make a C++ and a mex routine to speed this tedious thing up? I think a problem here is the output array is uncertain in terms of size.

If so does anyone have any experience or examples of how to create and map a Matlab structure in C++ so the output can be read by matlab? I use str2doubleq a lot, this takes cell array of strings and outputs doubles, which is quite vanilla, and I have made a few custom C and C++ codes, for fast date and time pulls, when datenum was too slow.

But this is annoying, me, I am sure there is a neater way to do it. Once the data is in the structure, it is reall fast to just use the fieldnames command and then loop through the sub data objects.

7 Commenti
Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti

Sindar il 16 Giu 2020

Apri in MATLAB Online

The point of tables is that they act like a more organized structure array. If you are naming each structure field, you already spend that memory. Depending on the shape of your data, something similar to:

mytable = array2table([B(IB,:) C(IB,:)],'RowNames',num2str(uA))

should work without any loops

Julian Williams il 16 Giu 2020

Benjamin, that is very neat, much appreciated. Sindar, many thanks for the point on the tables.

Accedi per commentare.

Accedi per rispondere a questa domanda.

Risposte (0)

Accedi per rispondere a questa domanda.

Categorie

MATLAB Language Fundamentals Matrices and Arrays Matrix Indexing

Scopri di più su Matrix Indexing in Help Center e File Exchange

Prodotti

MATLAB

Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Loop through the unique values of a very large column and extract data

7 Commenti
Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti

Risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Loop through the unique values of a very large column and extract data

7 Commenti Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti

Risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

7 Commenti
Mostra 5 commenti meno recentiNascondi 5 commenti meno recenti