Function 'pdf' doesn't return pdf values
14 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
I have a problem with the function pdf. I have this code:
estim_KDE = fitdist(data, 'kernel');
x = low:(abs(low-high)/(obs-1)):high;
y = pdf(estim_KDE,x);
plot(x,y,'r'), xlabel('xxx'), ylabel('yyy'),...
title('title'), legend('xyz');
but the function pdf returns values that have no sense for me: not comprised between 0 and 1, nor numbers between zero and one multiplied by the length of x (one of this two options is what i expected from the function pdf); for example: it gives me numbers like 20.something or 5.something, with length(x) = 1000 or more, numbers that have no sense for me. This happens for all the distributions i tried to have the pdf (always by the function fitdist). I discovered this problem only because i have plotted an histogram of the frequencies versus the Kernel Density Estimator.
Can someone help me, please?
0 Commenti
Risposte (2)
John D'Errico
il 6 Feb 2015
Modificato: John D'Errico
il 6 Feb 2015
I think you are under a common misperception about the PDF of a random variable. My guess is it is because of the letter P in PDF that confuses people, and yes, it is called a Probability Density Function.
The thing is, it does not actually return a probability. Consider a PDF with a very narrow spread. Here, a Gaussian with mean 0 and std deviation of 0.001.
normpdf(0,0,.001)
ans =
398.94
See that the PDF at 0 is 398.94, vastly larger than 1.
What matters is that the PDF integrates to 1. The integral of that function over the domain is 1.
It is the CDF that actually returns something you can interpret as a probability. Or, you can form the integral of the PDF to compute a probability. That is what the CDF gives you though.
4 Commenti
John D'Errico
il 10 Feb 2015
A plot of the PDF IS a graph of the relative frequency, to the extent that this makes any sense. Why do you care about the y-axis scaling? If that is what bothers you, then just turn off the y-axis labels.
The fact is, you CAN create a histogram, of the frequency in each "bin". You would do this by either an integration of the PDF over that sub-interval, or by subtracting successive values of the CDF, to get the relative fraction that would occur in that bin.
If you used a tiny enough bin interval, then the curve would look very nice and smooth. But the probability of a point falling in any single such tiny bin would be vanishingly small. So the y-axis scaling would be all tiny numbers. This reflects the fact that any single number has probability ZERO of arising.
So, just plot the PDF, and don't worry about the y-axis, or turn it off completely.
Rob Keeton
il 3 Set 2019
Multiply by the bandwidth of the pdf.
y = pdf(estim_KDE,x)*;estim_KDE.BandWidth;
0 Commenti
Vedere anche
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
