Adding values in vector according to labels in another vector
4 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Murat Azizoglu
il 6 Nov 2020
Commentato: Murat Azizoglu
il 14 Nov 2020
I have two arrays of same size E x 1, let's call them Values and Labels. Values have real-valued data points, while Labels have integer-valued group labels for each data point, ranging from 1 to N. In my application E could be on the order of 1e6, while N is on the order of 1e5, so there are on average ~10 data points sharing the same group label.
I would like to generate the accumulated sum of values sharing the same group label. That is, I want to generate an N x 1 vector GroupSum where GroupSum(i) = sum(Values(Labels == i)). This will be done for many instances of (Values, Labels) combinations in an outer loop, so it is important to do this quickly. Therefore, I would like to find a faster alternative to
for i = 1:N
GroupSum(i) = sum(Values(Labels == i));
end
Any suggestions would be appreciated.
Thanks in advance.
Murat
0 Commenti
Risposta accettata
Raunak Gupta
il 13 Nov 2020
Hi,
From the current implementation I assume that the labels run from 1 to N without missing any value in between. The best-case scenario I thought of is to at least traverse the Values array once and have an accumulating count for each label. This way you can save the overhead of doing logical indexing for all the Labels. Below code may help.
E = 1e6;
N = 1e5;
Labels = randi([1 N],E,1);
Values = randi([1 100],E,1);
GroupSum = zeros(1,N);
for i = 1:E
GroupSum(Labels(i)) = GroupSum(Labels(i)) + Values(i);
end
This doesn’t use (Labels == i) which can be time saving.
5 Commenti
Raunak Gupta
il 13 Nov 2020
Hi Murat,
As I suggested the time complexity here is suggested by the how many indexes from Value matrix you need to traverse. Since the sum of the values is required so the whole Value vector is needed to be traversed. So, even if we have a property which can group the number of labels based on there frequencies still in the end the Value needs to be traverse in full.
Thus as long as Sum is the final result, I don't see any further enhancement in terms of time complexity because the solution I suggested is having linear time complexity.
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Matrix Indexing in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!