Question on how to delete the data (outliers) in the boxplot

109 visualizzazioni (ultimi 30 giorni)
Hello everyone.
I need to know how to eliminate the representation of outliers (> 95%) in a boxplot representation.
Any suggestion?
Thank you
  4 Commenti
Adam Danz
Adam Danz il 28 Ago 2020
Modificato: Adam Danz il 28 Ago 2020
Nice! So, you wanted to remove the outlier markers completely.
Just be careful in how you interpret those plots. Outliers are often very informative and it should be indicated somehwere that they were removed from the visualization.
Binbin Qi
Binbin Qi il 28 Ago 2020
I think you can use
rmoutliers(A,'percentiles',threshold)

Accedi per commentare.

Risposta accettata

Adam Danz
Adam Danz il 5 Gen 2021
Modificato: Adam Danz il 5 Gen 2021
> how to eliminate the representation of outliers in a boxplot representation.
Removing outliers from the raw data
By default, outliers are data points that are more than 1.5*IQR from the median where IQR is the interquartile range, computed by iqr().
If the goal is to remove outliers from the raw data based on this definition, you can replace their values with NaNs to preserve the size and shape of the variable using,
rng default % For reproducibility
x = [randn(25,4);rand(2,4)-6;rand(2,4)+6];
x = reshape(x(randperm(numel(x))),size(x)); % scrambles rows of x; for demo purposes only
isout = isoutlier(x,'quartiles');
xClean = x;
xClean(isout) = NaN;
however, this won't necessarily remove outliers markers from the plot since the medians and IQRs of the data have changed and what used to not be an outlier may now be an outlier.
Removing outlier markers from the boxplot
If the goal is to remove outlier markers from the plot, produce the boxplots using an empty outlier marker style as Luís Barbosa suggested above.
% Create data
rng default % For reproducibility
x = [randn(25,4);rand(2,4)-6;rand(2,4)+6];
x = reshape(x(randperm(numel(x))),size(x)); % scrambles rows of x; for demo purposes only
% Plot with and without outlier markers
figure()
ax(1) = subplot(1,2,1);
boxplot(ax(1), x)
title(ax(1), 'With outlier markers')
grid(ax(1),'on')
ax(2) = subplot(1,2,2);
boxplot(ax(2), x, 'symbol', '')
title(ax(2), 'Without outlier markers')
grid(ax(2),'on')
However, removing outlier markers should usually be avoided and can be very deceptive. It's easy to view a figure at some point in the future and to forget that outliers were removed. Outliers can be very informative and are often just as important as the median and IQR. Therefore, it should be indicated somehwhere on the figure that they were removed from the visualization.

Più risposte (1)

Baldvin
Baldvin circa 7 ore fa
As per boxplot's documentation, each aspect of a boxplot is tagged and we can find them with findobj:
ax=axes;
boxplot(ax,[-3,repelem(3:8,10),14]) % Should contain two outliers at y=-3 and y=14.
% Obtain graphics handle for outliers:
o=findobj(ax.Children,'Tag','Outliers');
% We can now change all aspects of the markers:
set(o,'Marker','h','MarkerSize',12)
We could now make the outliers invisible or delete them from the plot, but I encourage folks to read Adam's post above!
% set(o,'Marker','none') % Misleading
% o.Visible = 'off' % Misleading
% delete(o) % Danger, danger!
Let me repeat Adams' words of wisdom that "... removing outlier markers should usually be avoided and can be very deceptive."

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by