Determining quantities of different integers in a large data set

I am currently working with a large data set, specifically 500 integers. The values are randomly generated between two specified values, x1 and x2. There is not a large range between x1 and x2 and therefore the same values are repeated a number of times. What I am trying to do is to come up with a way of determining how often each integer is repeated.
To clarify, assuming x1 = 5 and x2 = 10 and 500 integers are randomly generated between these values. Within the randomly generated integers, how many 5's were generated, how many 6, how many 7, etc etc.
I am attempting to find the quickest way of doing this without having lines and lines of code.

3 Commenti

Hello A^2
The histcounts function will probably get the job done. It lets you set bin edges and/or widths, and you might have to experiment around a little to get the result you want.
David, put this in the Answers section so you might get credit for it. histogram() is also a related function. By the way, it's funny how he thinks 500 elements is "large".
Hi all, thanks for the input. I'm new to working with Matlab so for me 500 is large but I'm sure for you guys it's probably not even close :)

Accedi per commentare.

 Risposta accettata

I'd normally recommend either sparse or accumarray to do the counts. Accumarray is arguably best here, because there is no need for a sparse result.
The only question would be if your limits x1 and x2 are VERY large numbers. Then most of the elements of the array will be zero.
x1 = 10;
x2 = 20;
X = randi([x1,x2],500,1);
counts = accumarray (X(:),1,[],@sum)
counts =
0
0
0
0
0
0
0
0
0
40
39
35
44
56
39
55
53
41
51
47
Since you know that the result will be zero below x1, just extract the counts you expect to see.
counts = counts(x1:x2)
counts =
40
39
35
44
56
39
55
53
41
51
47
Or use sparse, which creates the matrix as a sparse one.
sparse(X,1,1,20,1)
ans =
(10,1) 40
(11,1) 39
(12,1) 35
(13,1) 44
(14,1) 56
(15,1) 39
(16,1) 55
(17,1) 53
(18,1) 41
(19,1) 51
(20,1) 47
Or, you can use histcounts (or the older histc), but you need to be careful!!!!!!!! If you get sloppy and just do the obvious, you see that the last bin has too many counts in it.
histcounts(X,x1:x2)
ans =
40 39 35 44 56 39 55 53 41 98
You need to use histcounts like this to make it work properly:
histcounts(X,x1:x2+1)
ans =
40 39 35 44 56 39 55 53 41 51 47
Note that histcounts is a good choice if x1 is a very large number. Then the accumarray solution would generate an array with a huge number of zero elements at the start.
So the best solution must be based on the problem. This is often the case.

2 Commenti

If all the elements in X are integer values, and the bounds of that array are not too far apart, you could tell the histogram or histcounts functions to use the 'integers' BinMethod.
+1
Great review of multiple methods.

Accedi per commentare.

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by