MATLAB Answers

Leonard
0

Is it possible to create a histogram with fractional entries for each bin?

Asked by Leonard
on 6 Aug 2015
Latest activity Commented on by Leonard
on 7 Aug 2015
Thank you for looking at my question! I have included a brief introduction below; any suggestions or comments would be greatly appreciated!
Traditional histograms are generated using an array (e.g. sample_array = [1,1,1,2,2,3,3,3,3,4]) and the histogram is generated using h = histogram(sample_array,nbins);. In this example, with nbins = 4, I would have a simple histogram of column height associated with the number of times a particular value is observed in the sample array.
However, in my work I have come upon the need to instead use an array in place of a single value. For example:
sample_array = [1,1,[1,2],2,2,3,[2,3,4,5],3,4];
I am aware this is not an array. For convenience I am instead using a cell to contain the data:
sample_cell = {1,1,[1,2],2,2,3,[2,3,4,5],3,4};
What I need to do is generate the resulting histogram of sample_cell where I give EACH ENTRY of the cell EQUAL WEIGHT. The corresponding weights would be as follows:
sample_weight = {1,1,[1/2,1/2],1,1,1,[1/4,1/4,1/4,1/4],1,1};
From this, the resulting histogram would have the following counts in the bins for 1 thru 4:
Bin: Count
1: 2.5
2: 2.75
3: 2.25
4: 1.25
I am looking for a way to generate this resulting histogram which does not include using the least common multiple of the sizes of each entry. (I have a temporary solution to the problem including this quantity, however, I am unable to scale it up properly as I am dealing with very large prime numbers which result in LCM > 10^9.)
Again, any help or suggestions that you might have would be greatly appreciated!

  0 Comments

Sign in to comment.

1 Answer

Answer by David Young
on 6 Aug 2015
Edited by David Young
on 6 Aug 2015
 Accepted Answer

If all the samples are positive integers, and the bins are all centred on the positive integers and with unit width, as in the initial example, you can just do this:
% data
sample_cell = {1,1,[1,2],2,2,3,[2,3,4,5],3,4};
samples = cat(2, sample_cell{:});
weight_cell = cellfun(@(a) ones(size(a))/length(a), sample_cell, ...
'UniformOutput', false);
weights = cat(2, weight_cell{:});
counts = accumarray(samples(:), weights(:)).';
If this isn't the case (as in your more accurate example in the comments), you have to modify the code above by putting the samples into bins before weighting and counting them. This then looks like this:
% data and histogram parameters
sample_cell = {[0,0.41],0.32,[0.13,0.67,0.2],0.9,[0.3,1,0.89]};
edges = 0:0.1:1;
% put all the samples into one vector, and make a vector of their weights
samples = cat(2, sample_cell{:});
weight_cell = cellfun(@(a) ones(size(a))/length(a), sample_cell, ...
'UniformOutput', false);
weights = cat(2, weight_cell{:});
% work out which bin of the histogram each sample falls into
bins = discretize(samples, edges);
% Now form the counts, applying the weights for each sample
wtdcounts = accumarray(bins(:), weights(:)).';
% and normalise to probabilities
normcounts = wtdcounts/sum(wtdcounts); % normalise to sum to 1
% plot like histogram
centres = conv(edges, [0.5 0.5], 'valid');
bar(centres, normcounts, 1);
This gives the same results as the code in your comment, but will be a great deal more economical I think.

  3 Comments

Thank you for your response! To answer your question: No, my full problem begins with a cell of sets of unique values [0,1] which will require binning. For example:
sample_cell = {[0,0.41],0.32,[0.13,0.67,0.2],0.9,[0.3,1,0.89]};
For the time being, my histogram is generated using:
length_list = [];
for i = 1:length(sample_cell)
length_list = [length_list,length(sample_cell{i})];
end
LCM_length_list = lcms(length_list); % I got this program from MFEX
final_array = [];
for i = 1:length(sample_cell)
array = sample_cell{i};
for j = 1:length(array)
for k = 1:(LCM_length_list/length(array))
final_array = [final_array,array(j)];
end
end
end
h = histogram(final_array,0:0.1:1,'Normalization','Probability');
While this works as a temporary solution, I am ultimately looking to combine the histograms of many "sample_cell" sets of data while maintaining the overall number of entries in "sample cell" as the "integral" of the histogram. For example, in my above code "sample_cell" has 5 entries of equal weight. Another cell, sample_cell_2, could have 8 entries of equal weight. I am not able to combine the two resulting "final_array" arrays, however, because the least common multiple could potentially result in having upwards of 10^5 entries (due to large, prime numbers).
I've modified my answer to deal with the more general case. The second piece of code in the answer gives the same results as your lcm code above on the test data.
Exactly what I was looking for! Thank you so much!

Sign in to comment.