Asked by Leonard
on 6 Aug 2015

Thank you for looking at my question! I have included a brief introduction below; any suggestions or comments would be greatly appreciated!

Traditional histograms are generated using an array (e.g. sample_array = [1,1,1,2,2,3,3,3,3,4]) and the histogram is generated using h = histogram(sample_array,nbins);. In this example, with nbins = 4, I would have a simple histogram of column height associated with the number of times a particular value is observed in the sample array.

However, in my work I have come upon the need to instead use an array in place of a single value. For example:

sample_array = [1,1,[1,2],2,2,3,[2,3,4,5],3,4];

I am aware this is not an array. For convenience I am instead using a cell to contain the data:

sample_cell = {1,1,[1,2],2,2,3,[2,3,4,5],3,4};

What I need to do is generate the resulting histogram of sample_cell where I give EACH ENTRY of the cell EQUAL WEIGHT. The corresponding weights would be as follows:

sample_weight = {1,1,[1/2,1/2],1,1,1,[1/4,1/4,1/4,1/4],1,1};

From this, the resulting histogram would have the following counts in the bins for 1 thru 4:

Bin: Count

1: 2.5

2: 2.75

3: 2.25

4: 1.25

I am looking for a way to generate this resulting histogram which does not include using the least common multiple of the sizes of each entry. (I have a temporary solution to the problem including this quantity, however, I am unable to scale it up properly as I am dealing with very large prime numbers which result in LCM > 10^9.)

Again, any help or suggestions that you might have would be greatly appreciated!

Answer by David Young
on 6 Aug 2015

Edited by David Young
on 6 Aug 2015

Accepted Answer

If all the samples are positive integers, and the bins are all centred on the positive integers and with unit width, as in the initial example, you can just do this:

% data

sample_cell = {1,1,[1,2],2,2,3,[2,3,4,5],3,4};

samples = cat(2, sample_cell{:});

weight_cell = cellfun(@(a) ones(size(a))/length(a), sample_cell, ...

'UniformOutput', false);

weights = cat(2, weight_cell{:});

counts = accumarray(samples(:), weights(:)).';

If this isn't the case (as in your more accurate example in the comments), you have to modify the code above by putting the samples into bins before weighting and counting them. This then looks like this:

% data and histogram parameters

sample_cell = {[0,0.41],0.32,[0.13,0.67,0.2],0.9,[0.3,1,0.89]};

edges = 0:0.1:1;

% put all the samples into one vector, and make a vector of their weights

samples = cat(2, sample_cell{:});

weight_cell = cellfun(@(a) ones(size(a))/length(a), sample_cell, ...

'UniformOutput', false);

weights = cat(2, weight_cell{:});

% work out which bin of the histogram each sample falls into

bins = discretize(samples, edges);

% Now form the counts, applying the weights for each sample

wtdcounts = accumarray(bins(:), weights(:)).';

% and normalise to probabilities

normcounts = wtdcounts/sum(wtdcounts); % normalise to sum to 1

% plot like histogram

centres = conv(edges, [0.5 0.5], 'valid');

bar(centres, normcounts, 1);

This gives the same results as the code in your comment, but will be a great deal more economical I think.

Leonard
on 6 Aug 2015

Thank you for your response! To answer your question: No, my full problem begins with a cell of sets of unique values [0,1] which will require binning. For example:

sample_cell = {[0,0.41],0.32,[0.13,0.67,0.2],0.9,[0.3,1,0.89]};

For the time being, my histogram is generated using:

length_list = [];

for i = 1:length(sample_cell)

length_list = [length_list,length(sample_cell{i})];

end

LCM_length_list = lcms(length_list); % I got this program from MFEX

final_array = [];

for i = 1:length(sample_cell)

array = sample_cell{i};

for j = 1:length(array)

for k = 1:(LCM_length_list/length(array))

final_array = [final_array,array(j)];

end

end

end

h = histogram(final_array,0:0.1:1,'Normalization','Probability');

While this works as a temporary solution, I am ultimately looking to combine the histograms of many "sample_cell" sets of data while maintaining the overall number of entries in "sample cell" as the "integral" of the histogram. For example, in my above code "sample_cell" has 5 entries of equal weight. Another cell, sample_cell_2, could have 8 entries of equal weight. I am not able to combine the two resulting "final_array" arrays, however, because the least common multiple could potentially result in having upwards of 10^5 entries (due to large, prime numbers).

David Young
on 6 Aug 2015

Leonard
on 7 Aug 2015

Exactly what I was looking for! Thank you so much!

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.