Finding entropy from a probability distribution

dear all,
i am trying to find distribution of a random variable by using "hist" command. i am getting the distribution, but i want to calculate the entropy from that histogram. can anyone please help me to solve this issue. any kind of help is greatful for me. thanks in advance.

 Risposta accettata

x = randn(100000,1);
[counts,binCenters] = hist(x,100);
binWidth = diff(binCenters);
binWidth = [binWidth(end),binWidth]; % Replicate last bin width for first, which is indeterminate.
nz = counts>0; % Index to non-zero bins
frequency = counts(nz)/sum(counts(nz));
H = -sum(frequency.*log(frequency./binWidth(nz)))
It seems that the most common references (i.e. Wikipedia!) are assuming a discrete random variate (with a specified probability mass function), rather than a discrete approximation to a continuous variate. In that case, the "bin width" is effectively 1. Here is a reference that discusses the case of non-unit bin width, and has the formula that I used as the basis of the above calculation: http://www2.warwick.ac.uk/fac/soc/economics/staff/academic/wallis/publications/entropy.pdf

6 Commenti

Realized that my bin width calculation is a bit sloppy. One should really calculate the midpoints between bin centers, and get the widths from those. (That would leave both ends indeterminate, as they technically go in infinity.) A better approach for the histogram would probably be to use the histc() function, which specifies the bin edges, so calculating the widths is more direct. In most circumstances, this solution is going to do just fine, though.
This makes sense - I was heading to the same place! I've deleted my incorrect answer.
hi cyclist,
i am very thankful for your valuable help.but, what is the physical meaning of the entropy value. i tried from various sources but i am not very clear about that. what type of values it can have (-inf to +inf). what physically this values resembles the probability distribution. please let me know some resources where can i get information about this.
hi,
if you are using histc() command for the histogram, if you test with different bandwidth values we are getting different entropy values for the same data, can u please explain me how to use the code with histc(). thank you very much.
I frankly cannot explain the entire concept of entropy to you here. I think that a careful reading of this Wikipedia page is a good start: http://en.wikipedia.org/wiki/Entropy_(information_theory)
Rather than my guessing at what you may have done incorrectly with histc(), maybe you could post a new question specifically about this? If you have not done so, I suggest you carefully read the documentation for hist() and histc(). The help files are very precise about how the calculations are done. For example, the help file for histc() is very specific about how it treats cases that land exactly on a bin edge.

Accedi per commentare.

Più risposte (0)

Categorie

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by