Main Content

Wilkinson Notation

Overview

Wilkinson notation provides a way to describe regression and repeated measures models without specifying coefficient values. This specialized notation identifies the response variable and which predictor variables to include or exclude from the model. You can also include squared and higher-order terms, interaction terms, and grouping variables in the model formula.

Specifying a model using Wilkinson notation allows you to include or exclude individual predictors and interaction terms from the model, and change the model formula without specifying new input data.

Basic Formula Specification

You can specify a formula in Wilkinson notation as a string scalar or character vector of the form y ~ terms. In the formula, y is the name or names of the response variable or response variables, and terms contains the predictor terms in the model. Specify terms by adding and subtracting the following terms in Wilkinson notation.

Term in Wilkinson NotationTerm Added to Model
1

Intercept

x1x1
x1+x2x1, x2
x1/x2x1, x1*x2
x1*x2x1, x2, x1*x2
x1:x2x1*x2
x1^kx1, x1^2, x1^3, …, x1^k

In the table, x1 and x2 are the names of any two predictor variables. The formula includes an intercept term by default. To remove it, include a -1 term in terms.

Examples

The following table includes some examples of formulas in Wilkinson notation and the corresponding terms added to the regression model.

Formula in Wilkinson NotationModel TermsEquation
"y ~ x1+x2-1"x1, x2

y=c1x1+c2x2

"y ~ x1:x2:x3"x1*x2*x3, 1

y=c1x1x2x3+c2

"y ~ x1*x2*x3"x1, x2, x3, x1*x2, x1*x3, x2*x3, x1*x2*x3, 1

y=c1x1+c2x2+c3x3+c4x1x2+c5x1x3+c6x2x3+c7x1x2x3+c8

"y ~ x1^3-x1^2"x1^3, 1

y=c1x13+c2

In the above table y represents the response variable, the xi represent predictor variables, and cj are the model coefficients.

Specify Random-Effects

For random- and mixed-effects models, a random effect term also specifies the corresponding grouping variable. When you specify a random effect term using Wilkinson notation, the software does not automatically add a corresponding fixed effect term. You can represent random effects in Wilkinson notation using the following terms:

Term in Wilkinson NotationDescription
(1|g1)Random effect for the intercept for each level of the grouping variable g1.
(x1|g1)Random intercept and slope for each level of g1 with possible correlation between them. This term is equivalent to (1+x1|g1).
(x1+x2|g1)Random intercept and slopes for x1 and x2 with possible correlation between them for each level of g1. This term is equivalent to (1+x1+x2|g1).
(x1|g1)+(x2|g2)Random intercept and slope for x1 grouped by g1, and random intercept and slope for x2 grouped by g2. This term is equivalent to (1+x1|g1)+(1+x2|g2)
(x1|g1:g2)Random intercept for each level of the interaction between g1 and g2. In other words, each unique combination of the levels of g1 and g2 corresponds to a different random intercept and slope.

In the table, g1 and g2 are the names of any two grouping variables.

Examples

The following table includes some examples of formulas in Wilkinson notation that include random effects, and their corresponding fixed and random effects terms.

Formula in Wilkinson NotationFixed Effect Model TermsRandom Effect Model TermsEquation
"y ~ 1+(1|g1)"11

yij=c0+b01j+εij

"y ~ x1+(1|g1)"x11

yij=c0+c1x1ij+b01j+εij

"y ~ (x1|g1)+(x2|g2)"1
  • x1 grouped by g1

  • x2 grouped by g2

yijk=c0+b01j+b11jx1ijk+b02k+b12kx2ijk+εijk

"y ~ x1+(1+x1|g1)"x11 and x1 grouped by g1, where 1 and x1 where their random effects can be correlated.

yij=c0+c1x1ij+b01j+b11jx1ij+εij

(1|g1)+(-1+x1|g1)11 and x1 grouped by g1, where their random effects are uncorrelated.

yij=c0+b01j+b11jx1ij+εij

In the above equations, i denotes the index of the observation, j denotes the level for the first grouping variable g1, and k denotes the level for the second grouping variable g2. The coefficient cm corresponds to the mth fixed-effect term. The coefficient bmn corresponds to the mth random-effect term for the nth grouping variable.

Specify Repeated Measures

For repeated measures models, you can specify response variables using the following terms.

Response Terms in ModelResponse Variables Added to Model
y1-yky1, y2, …, yk
y1, y2, y3y1, y2, y3

In the table, each yi is the name of any response variable.

Examples

The following table includes some examples of formulas in Wilkinson notation that include repeated measures, and their corresponding response variables.

Formula in Wilkinson NotationModel Response VariablesEquations
"y1-y5 ~ x1:x2"y1,y2,y3, y4, y5

y1=c1x1x2+c2y2=c3x1x2+c4y3=c5x1x2+c6y4=c7x1x2+c8y5=c9x1x2+c10

"y1,y4,y5 ~ x1^3"y1, y4, y5

y1=c1x1+c2x12+c3x13+c4y4=c5x1+c6x12+c7x13+c8y5=c9x1+c10x12+c11x13+c12

Specify Nested Factors for anova objects

You can specify nested factors for an anova object using the following terms.

Term in Wilkinson NotationDescription
x2(x1)Factor x2 is nested within factor x1
x2(x1)+x3(x2)Factor x2 is nested within factor x1 and factor x3 is nested within x2.
x3(x1,x2)Factor x3 is nested within factors x1 and x2.

You cannot specify an interaction term in Wilkinson notation such as x1:x2(x1) where the second factor in the term is nested within the first.

Examples

The following table includes some examples of formulas in Wilkinson notation that include nested factors.

Formula in Wilkinson NotationModel TermsEquation
"y~x3:x2(x1)"x3*x2 where x2 is nested within x1.

y=x3x2(1)

"y~x2(x1)+x3(x1)"x2 and x3 where both factors are nested within factor x1.

y=x2(1)+x3(1)

References

[1] Wilkinson, G. N., and C. E. Rogers. "Symbolic description of factorial models for analysis of variance." J. Royal Statistics Society 22, pp. 392–399, 1973.