Azzera filtri
Azzera filtri

How can I speed up this indexing code?

3 visualizzazioni (ultimi 30 giorni)
bhousden
bhousden il 28 Ott 2021
Commentato: bhousden il 29 Ott 2021
I have a cell array in which each cell contains a string. E.g....
a={'AAAAAA';'BBBBBB';'CCCCCC';'AAAAAA';'DDDDDD'};
Each cell in array a is associated with a row in a numerical array that contains 10 columns. E.g....
b=[0,0,0,1,1,0,0,0,0,0;
0,1,0,0,0,0,0,0,0,0;
1,0,1,0,1,0,2,0,0,0;
3,0,0,0,0,0,0,0,0,1;
0,0,0,0,0,0,2,0,0,1];
Some strings in a are repeated such as 'AAAAAA' as shown above. What I need to do is find all repeated cells in a and sum the assocated columns from b into a single row. This should result in two new arrays (unia and bnew) which have equal numbers of rows but every string in unia is unique.
Easy enough to do with a loop such as:
unia=unique(a);
bnew=zeros(numel(unia),10);
for n=1:numel(unia)
pos=find(strcmp(a,unia{n}));
bnew(n,:)=sum(b(pos,:),1);
end
This works fine for small arrays but I have a case where a has 6 million cells and unia has 300,000 cells so I need something much faster. Any ideas?
Thanks!

Risposta accettata

Ive J
Ive J il 28 Ott 2021
Avoid comparing strings within the loop and instead take advantage of the index vector from unique:
a = ["A", "B2", "A", "C", "AA", "B2", "B2"]; % use strings instead of cell array of characters, they're much more efficinet to work with
b = randi([0 2], numel(a), 3)
b = 7×3
1 0 1 0 1 2 0 1 2 0 2 1 2 0 2 2 2 0 0 2 1
[anew, ~, idx] = unique(a);
bnew = arrayfun(@(x) sum(b(x == idx, :), 1), 1:numel(anew), 'uni', false);
bnew = vertcat(bnew{:})
bnew = 4×3
1 1 3 2 0 2 2 5 3 0 2 1
anew
anew = 1×4 string array
"A" "AA" "B2" "C"
Also, you can use tall arrays when dealing with large arrays.

Più risposte (0)

Prodotti


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by