Plotting boxplot with distributions other than normal distribution

Hi,
I was wondering if it's possible to use boxplot or a similar plotting technique to plot data that are not normally distributed?
Thanks!

 Risposta accettata

"[is it] possible to use boxplot or a similar plotting technique to plot data that are not normally distributed? "
Yes.
Is it the best way to summarize a non-normal distribution? Probably not.
Below is a skewed distribution shown as a histogram and a boxplot. You can see the median value of the boxplot is accurate and the quartile markers (the edges of the 'box') show the skew. The outliers also indicate a skew. However, the median value doesn't indicate the expected value since the distribution isn't anywhere near normal. The histogram is much more descriptive and doesn't require knowing how to read a boxplot for the viewer to see the shape of the distribution or the expected value. But if you're more interested in the median and quartile values, a boxplot may better suit your needs.
x = pearsrnd(0,1,1,4,1000,1);
med = median(x);
clf()
s(1) = subplot(4,1,1:3);
histogram(x)
xline(med,'r-','Median', 'linewidth',2)
grid on
s(2) = subplot(4,1,4);
boxplot(x, 'Orientation','Horizontal')
grid on
linkaxes(s, 'x')

5 Commenti

Thanks for your answer. I guess the combination as you've done is also very nice way of showing the data. You are right a histogram is much more descriptive. However histogram is not ideal for showing many data sets in one plot.
An issue I see with the way you have compared the boxplot and the histogram is that the median in both plots is calculated based on a normal distribution. Which means that naturally it's going to overlap in both plots. However, if you find the best fit for the distribution (which is most probably not a normal distribution due to the skewness) the median will be different.
pd=fitdist(x,'Normal')
h = chi2gof(data,'CDF',pd)
if you run the above code on your data the answer is 1 which means the null hypthesis that the distribution is normal is rejected...
What command can I use instead of xline to draw a vertical line? I am using Matlab R2018a version and xline is not available unfortunately.
To extend a line from the bottom to the top of the plot,
hold on
set(gca, 'YLimMode', 'Manual') % or set ylim()
plot([med, med],ylim(), 'r-')
"An issue I see with the way you have compared the boxplot and the histogram is that the median in both plots is calculated based on a normal distribution. Which means that naturally it's going to overlap in both plots. However, if you find the best fit for the distribution (which is most probably not a normal distribution due to the skewness) the median will be different."
The two underlined sections of your comment are incorrect. The median value has nothing to do with the shape of the distribution. In fact, that's the point I was making by showing the histogram and the boxplot together. The median value doesn't indicate the expected value of a skewed distribution which is why the boxplot isn't the best representation of a skewed distribution.
yes you are right, the expected value would be different!
Yes, and that's something the histogram shows but the boxplot does not.
If you'd like to use a boxplot for other reasons, note that you could compute the expected value from the distribution (ie, fitting, like you mentioned) and then add a marker to the boxplot where peak of the distribution is.
This demo just marks the center of the tallest bin.
% t is the output from histogram()
% t = histogram(x);
[~, maxIdx] = max(t.Values);
peakBinCenter = t.BinEdges(maxIdx+1) - t.BinWidth/2;
hold on
plot(peakBinCenter, 1, 'g*')

Accedi per commentare.

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by