Note: This page has been translated by MathWorks. Click here to see

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

**MathWorks Machine Translation**

The automated translation of this page is provided by a general purpose third party translator tool.

MathWorks does not warrant, and disclaims all liability for, the accuracy, suitability, or fitness for purpose of the translation.

One-way analysis of variance

`p = anova1(y)`

`p = anova1(y,group)`

`p = anova1(y,group,displayopt)`

```
[p,tbl]
= anova1(___)
```

```
[p,tbl,stats]
= anova1(___)
```

returns
the `p`

= anova1(`y`

)*p*-value for a balanced one-way ANOVA. It also displays the standard
ANOVA table (`tbl`

) and a box plot of the columns
of `y`

. `anova1`

tests the hypothesis
that the samples in `y`

are drawn from populations
with the same mean against the alternative hypothesis that the population
means are not all the same.

enables
the ANOVA table and box plot displays when `p`

= anova1(`y`

,`group`

,`displayopt`

)`displayopt`

is `'on'`

(default)
and suppresses the displays when `displayopt`

is `'off'`

.

`[`

returns a structure, `p`

,`tbl`

,`stats`

]
= anova1(___)`stats`

,
which you can use to perform a multiple comparison test. A multiple
comparison test enables you to determine which pairs of group means
are significantly different. To perform this test, use `multcompare`

, providing the `stats`

structure
as an input argument.

Create sample data matrix `y`

with columns that are constants, plus random normal disturbances with mean 0 and standard deviation 1.

y = meshgrid(1:5); rng default; % For reproducibility y = y + normrnd(0,1,5,5)

`y = `*5×5*
1.5377 0.6923 1.6501 3.7950 5.6715
2.8339 1.5664 6.0349 3.8759 3.7925
-1.2588 2.3426 3.7254 5.4897 5.7172
1.8622 5.5784 2.9369 5.4090 6.6302
1.3188 4.7694 3.7147 5.4172 5.4889

Perform one-way ANOVA.

p = anova1(y)

p = 0.0023

The ANOVA table shows the between-groups variation (`Columns`

) and within-groups variation (`Error`

). `SS`

is the sum of squares, and `df`

is the degrees of freedom. The total degrees of freedom is total number of observations minus one, which is 25 - 1 = 24. The between-groups degrees of freedom is number of groups minus one, which is 5 - 1 = 4. The within-groups degrees of freedom is total degrees of freedom minus the between groups degrees of freedom, which is 24 - 4 = 20.

`MS`

is the mean squared error, which is `SS/df`

for each source of variation. The *F*-statistic is the ratio of the mean squared errors (13.4309/2.2204). The *p*-value is the probability that the test statistic can take a value greater than the value of the computed test statistic, i.e., P(F > 6.05). The small *p*-value of 0.0023 indicates that differences between column means are significant.

Input the sample data.

strength = [82 86 79 83 84 85 86 87 74 82 ... 78 75 76 77 79 79 77 78 82 79]; alloy = {'st','st','st','st','st','st','st','st',... 'al1','al1','al1','al1','al1','al1',... 'al2','al2','al2','al2','al2','al2'};

The data are from a study of the strength of structural beams in Hogg (1987). The vector strength measures deflections of beams in thousandths of an inch under 3000 pounds of force. The vector alloy identifies each beam as steel (`'st'`

), alloy 1 (`'al1'`

), or alloy 2 (`'al2'`

). Although alloy is sorted in this example, grouping variables do not need to be sorted.

Test the null hypothesis that the steel beams are equal in strength to the beams made of the two more expensive alloys. Turn the figure display off and return the ANOVA results in a cell array.

`[p,tbl] = anova1(strength,alloy,'off')`

p = 1.5264e-04

`tbl = `*4x6 cell array*
Columns 1 through 5
{'Source'} {'SS' } {'df'} {'MS' } {'F' }
{'Groups'} {[184.8000]} {[ 2]} {[ 92.4000]} {[ 15.4000]}
{'Error' } {[102.0000]} {[17]} {[ 6.0000]} {0x0 double}
{'Total' } {[286.8000]} {[19]} {0x0 double} {0x0 double}
Column 6
{'Prob>F' }
{[1.5264e-04]}
{0x0 double }
{0x0 double }

The total degrees of freedom is total number of observations minus one, which is $20-1=19$. The between-groups degrees of freedom is number of groups minus one, which is $3-1=2$. The within-groups degrees of freedom is total degrees of freedom minus the between groups degrees of freedom, which is $19-2=17$.

`MS`

is the mean squared error, which is `SS/df`

for each source of variation. The *F*-statistic is the ratio of the mean squared errors. The *p*-value is the probability that the test statistic can take a value greater than or equal to the value of the test statistic. The *p*-value of 1.5264e-04 suggests rejection of the null hypothesis.

You can retrieve the values in the ANOVA table by indexing into the cell array. Save the *F*-statistic value and the *p*-value in the new variables `Fstat`

and `pvalue`

.

Fstat = tbl{2,5}

Fstat = 15.4000

pvalue = tbl{2,6}

pvalue = 1.5264e-04

Input the sample data.

strength = [82 86 79 83 84 85 86 87 74 82 ... 78 75 76 77 79 79 77 78 82 79]; alloy = {'st','st','st','st','st','st','st','st',... 'al1','al1','al1','al1','al1','al1',... 'al2','al2','al2','al2','al2','al2'};

The data are from a study of the strength of structural beams in Hogg (1987). The vector strength measures deflections of beams in thousandths of an inch under 3000 pounds of force. The vector alloy identifies each beam as steel (`st`

), alloy 1 (`al1`

), or alloy 2 (`al2`

). Although alloy is sorted in this example, grouping variables do not need to be sorted.

Perform one-way ANOVA using `anova1`

. Return the structure `stats`

, which contains the statistics `multcompare`

needs for performing Multiple Comparisons.

[~,~,stats] = anova1(strength,alloy);

The small *p*-value of 0.0002 suggests that the strength of the beams is not the same.

Perform a multiple comparison of the mean strength of the beams.

[c,~,~,gnames] = multcompare(stats);

Display the comparison results with the corresponding group names.

[gnames(c(:,1)), gnames(c(:,2)), num2cell(c(:,3:6))]

`ans = `*3x6 cell array*
Columns 1 through 5
{'st' } {'al1'} {[ 3.6064]} {[ 7]} {[10.3936]}
{'st' } {'al2'} {[ 1.6064]} {[ 5]} {[ 8.3936]}
{'al1'} {'al2'} {[-5.6280]} {[-2]} {[ 1.6280]}
Column 6
{[1.6831e-04]}
{[ 0.0040]}
{[ 0.3560]}

The first two columns show the pair of groups that are compared. The fourth column shows the difference between the estimated group means. The third and fifth columns show the lower and upper limits for the 95% confidence intervals of the true difference of means. The sixth column shows the *p*-value for a hypothesis that the true difference of means for the corresponding groups is equal to zero.

The first two rows show that both comparisons involving the first group (steel) have confidence intervals that do not include zero. Because the corresponding *p*-values (1.6831e-04 and 0.0040, respectively) are small, those differences are significant.

The third row shows that the differences in strength between the two alloys is not significant. A 95% confidence interval for the difference is [-5.6,1.6], so you cannot reject the hypothesis that the true difference is zero. The corresponding *p*-value of 0.3560 in the sixth column confirms this result.

In the figure, the blue bar represents the comparison interval for mean material strength for steel. The red bars represent the comparison intervals for the mean material strength for alloy 1 and alloy 2. Neither of the red bars overlap with the blue bar, which indicates that the mean material strength for steel is significantly different from that of alloy 1 and alloy 2. To confirm the significant difference by clicking the bars that represent alloy 1 and 2.

`y`

— sample datavector | matrix

Sample data, specified as a vector or a matrix.

If

`y`

is a vector, you must specify the`group`

input argument.`group`

must be a categorical variable, numeric vector, logical vector, character array, string array, or cell array of character vectors, with one name for each element of`y`

. The`anova1`

function treats the`y`

values corresponding to the same value of`group`

as part of the same group. Use this design when groups have different numbers of elements (unbalanced ANOVA).If

`y`

is a matrix and you do not specify`group`

,`anova1`

treats each column of`y`

as a separate group. In this design, the function evaluates whether the population means of the columns are equal. Use this design when each group has the same number of elements (balanced ANOVA).If

`y`

is a matrix and you specify`group`

, then`group`

must be a character array, string array, or cell array of character vectors, with one name for each column of`y`

. The`anova1`

function treats the columns that have the same group name as part of the same group.

If `group`

contains empty or `NaN`

-valued elements,
`anova1`

disregards the corresponding observations in
`y`

.

**Data Types: **`single`

| `double`

`group`

— Grouping variablenumeric vector | logical vector | character array | string array | cell array of character vectors

Grouping variable, specified as a numeric or logical vector, character array, string array, or cell array of character vectors, containing group names.

If

`y`

is a vector,`group`

must be a categorical variable, numeric vector, logical vector, character array, string array, or cell array of character vectors, with one name for each element of`y`

. The`anova1`

function treats the`y`

values corresponding to the same value of`group`

as part of the same group.*N*is the total number of observations.If

`y`

is a matrix, then`group`

must be a character array, string array, or cell array of character vectors, with one group name for each column of`y`

. The`anova1`

function treats the columns of`y`

that have the same group name as part of the same group.If you do not want to specify group names, enter an empty array (

`[]`

) or omit this argument.

If `group`

contains empty or `NaN`

-valued elements, the
corresponding observations in `y`

are disregarded.

For more information on grouping variables, see Grouping Variables.

For example, if `y`

is a vector, with observations
categorized into groups 1, 2, and 3, then you can specify the grouping
variables as follows.

**Example: **`'group',[1,2,1,3,1,...,3,1]`

For example, if `y`

is a matrix, with six
columns categorized into groups red, white, and black, then you can
specify the grouping variables as follows.

**Example: **`'group',{'white','red','white','black','red'}`

**Data Types: **`single`

| `double`

| `logical`

| `char`

| `string`

| `cell`

`displayopt`

— Indicator to display ANOVA table and box plot`'on'`

(default) | `'off'`

Indicator to display ANOVA table and box plot, specified as `'on'`

or `'off'`

.
When `displayopt`

is `'off'`

, `anova1`

returns
the output arguments, only. It does not display the standard ANOVA
table and box-plot of the columns of `y`

.

**Example: **`p = anova(x,group,'off')`

`p`

— scalar value

*p*-value for the *F*-test,
returned as a scalar value. *p*-value is the probability
that the *F*-statistic can take a value larger than
the computed test-statistic value. `anova1`

tests
the null hypothesis that all group means are equal to each other against
the alternative hypothesis that at least one group mean is different
from the others. The function derives the *p*-value
from the cdf of the *F*-distribution.

A*p*-value that is smaller than the significance
level indicates that at least one of the sample means is significantly
different from the others. Common significance levels are 0.05 or
0.01.

`tbl`

— ANOVA tablecell array

ANOVA table, returned as a cell array. `tbl`

has
six columns.

Column | Definition |
---|---|

`source` | The source of the variability. |

`SS` | The sum of squares due to each source. |

`df` | The degrees of freedom associated with each source. Suppose N is
the total number of observations and k is the number
of groups. Then, N – k is
the within-groups degrees of freedom (`Error` ), k –
1 is the between-groups degrees of freedom (`Columns` ),
and N – 1 is the total degrees of freedom. N –
1 = (N – k) + (k –
1) |

`MS` | The mean squares for each source, which is the ratio `SS/df` . |

`F` | F-statistic, which is the ratio of the mean
squares. |

`Prob>F` | The p-value, which is the probability that
the F-statistic can take a value larger than the
computed test-statistic value. `anova1` derives this
probability from the cdf of F-distribution. |

The rows of the ANOVA table show the variability in the data that is divided by the source.

Row | Definition |
---|---|

`Groups` | Variability due to the differences among the group means (variability between groups) |

`Error` | Variability due to the differences between the data in each
group and the group mean (variability within groups) |

`Total` | Total variability |

`stats`

— Statistics for multiple comparison testsstructure

Statistics for multiple comparison tests, returned
as a structure. `stats`

has six fields.

Field name | Definition |
---|---|

`gnames` | Names of the groups |

`n` | Number of observations in each group |

`source` | Source of the `stats` output |

`means` | Estimated values of the means |

`df` | Error (within-groups) degrees of freedom (N – k,
where N is the total number of observations and k is
the number of groups) |

`s` | Square root of the mean squared error |

`anova1`

returns box plots
of the observations in `y`

, by group. Box plots
provide a visual comparison of the group location parameters.

If `y`

is a vector, then the plot shows one
box for each value of `group`

. If `y`

is
a matrix and you do not specify `group`

, then the
plot shows one box for each column of `y`

. On each
box, the central mark is the median and the edges of the box are the
25th and 75th percentiles (1st and 3rd quantiles). The whiskers extend
to the most extreme data points that are not considered outliers.
The outliers are plotted individually. The interval endpoints are
the extremes of the notches. The extremes correspond to *q*2
– 1.57(*q*3 – *q*1)/sqrt(*n*)
and *q*2 + 1.57(*q*3 – *q*1)/sqrt(*n*),
where *q*2 is the median (50th percentile), *q*1
and *q*3 are the 25th and 75th percentiles, respectively,
and *n* is the number of observations without any `NaN`

values.

Two medians are significantly different at the 5% significance
level if their intervals do not overlap. This test is different from
the *F*-test that ANOVA performs, but large differences
in the center lines of the boxes correspond to large *F*-statistic
values and correspondingly small *p*-values. For
more information about box plots, see `boxplot`

.

[1] Hogg, R. V., and J. Ledolter. *Engineering
Statistics*. New York: MacMillan, 1987.

`anova2`

| `anovan`

| `boxplot`

| `multcompare`

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web siteYou can also select a web site from the following list:

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

- América Latina (Español)
- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)