Wilkinson Notation
Overview
Wilkinson notation provides a way to describe regression and repeated measures models without specifying coefficient values. This specialized notation identifies the response variable and which predictor variables to include or exclude from the model. You can also include squared and higher-order terms, interaction terms, and grouping variables in the model formula.
Specifying a model using Wilkinson notation allows you to include or exclude individual predictors and interaction terms from the model, and change the model formula without specifying new input data.
Basic Formula Specification
You can specify a formula in Wilkinson notation as a string scalar or character
vector of the form y ~ terms
. In the formula,
y
is the name or names of the response variable or response
variables, and terms
contains the predictor terms in the model.
Specify terms
by adding and subtracting the following terms in
Wilkinson notation.
Term in Wilkinson Notation | Term Added to Model |
---|---|
1 | Intercept |
x1 | x1 |
x1+x2 | x1 , x2 |
x1/x2 | x1 , x1*x2 |
x1*x2 | x1 , x2 ,
x1*x2 |
x1:x2 | x1*x2 |
x1^k | x1 , x1^2 ,
x1^3 , …, x1^k |
In the table, x1
and x2
are the names of any
two predictor variables. The formula includes an intercept term by default. To
remove it, include a -1
term in terms
.
Examples
The following table includes some examples of formulas in Wilkinson notation and the corresponding terms added to the regression model.
Formula in Wilkinson Notation | Model Terms | Equation |
---|---|---|
"y ~ x1+x2-1" | x1 , x2 |
|
"y ~ x1:x2:x3" | x1*x2*x3 , 1 |
|
"y ~ x1*x2*x3" | x1 , x2 ,
x3 , x1*x2 ,
x1*x3 , x2*x3 ,
x1*x2*x3 , 1 |
|
"y ~ x1^3-x1^2" | x1^3 , 1 |
|
In the above table y represents the response variable, the xi represent predictor variables, and cj are the model coefficients.
Specify Random-Effects
For random- and mixed-effects models, a random effect term also specifies the corresponding grouping variable. When you specify a random effect term using Wilkinson notation, the software does not automatically add a corresponding fixed effect term. You can represent random effects in Wilkinson notation using the following terms:
Term in Wilkinson Notation | Description |
---|---|
(1|g1) | Random effect for the intercept for each level of the
grouping variable g1 . |
(x1|g1) | Random intercept and slope for each level of
g1 with possible correlation between
them. This term is equivalent to
(1+x1|g1) . |
(x1+x2|g1) | Random intercept and slopes for x1 and
x2 with possible correlation between them
for each level of g1 . This term is equivalent
to (1+x1+x2|g1) . |
(x1|g1)+(x2|g2) | Random intercept and slope for x1 grouped
by g1 , and random intercept and slope for
x2 grouped by g2 . This
term is equivalent to
(1+x1|g1)+(1+x2|g2) |
(x1|g1:g2) | Random intercept for each level of the interaction between
g1 and g2 . In other
words, each unique combination of the levels of
g1 and g2 corresponds
to a different random intercept and slope. |
In the table, g1
and g2
are
the names of any two grouping variables.
Examples
The following table includes some examples of formulas in Wilkinson notation that include random effects, and their corresponding fixed and random effects terms.
Formula in Wilkinson Notation | Fixed Effect Model Terms | Random Effect Model Terms | Equation |
---|---|---|---|
"y ~ 1+(1|g1)" | 1 | 1 |
|
"y ~ x1+(1|g1)" | x1 | 1 |
|
"y ~ (x1|g1)+(x2|g2)" | 1 |
|
|
"y ~ x1+(1+x1|g1)" | x1 | 1 and x1 grouped by
g1 , where 1 and
x1 where their random effects can be
correlated. |
|
(1|g1)+(-1+x1|g1) | 1 | 1 and x1 grouped by
g1 , where their random effects are
uncorrelated. |
|
In the above equations, i denotes the index of the
observation, j denotes the level for the first grouping
variable g1
, and k denotes the level for
the second grouping variable g2
. The coefficient
cm corresponds to the mth fixed-effect
term. The coefficient bmn corresponds to the
mth random-effect term for the nth
grouping variable.
Specify Repeated Measures
For repeated measures models, you can specify response variables using the following terms.
Response Terms in Model | Response Variables Added to Model |
---|---|
y1-yk | y1 , y2 , …,
yk |
y1 , y2 ,
y3 | y1 , y2 ,
y3 |
In the table, each yi
is the name of any response
variable.
Examples
The following table includes some examples of formulas in Wilkinson notation that include repeated measures, and their corresponding response variables.
Formula in Wilkinson Notation | Model Response Variables | Equations |
---|---|---|
"y1-y5 ~ x1:x2" | y1 ,y2 ,y3 ,
y4 , y5 |
|
"y1,y4,y5 ~ x1^3" | y1 , y4 ,
y5 |
|
Specify Nested Factors for anova
objects
You can specify nested factors for an anova
object using the following terms.
Term in Wilkinson Notation | Description |
---|---|
x2(x1) | Factor x2 is nested within factor
x1 |
x2(x1)+x3(x2) | Factor x2 is nested within factor
x1 and factor x3 is
nested within x2 . |
x3(x1,x2) | Factor x3 is nested within factors
x1 and x2 . |
You cannot specify an interaction term in Wilkinson notation such as
x1:x2(x1)
where the second factor in the term is nested
within the first.
Examples
The following table includes some examples of formulas in Wilkinson notation that include nested factors.
Formula in Wilkinson Notation | Model Terms | Equation |
---|---|---|
"y~x3:x2(x1)" | x3*x2 where x2 is
nested within x1 . |
|
"y~x2(x1)+x3(x1)" | x2 and x3 where
both factors are nested within factor
x1 . |
|
References
[1] Wilkinson, G. N., and C. E. Rogers. "Symbolic description of factorial models for analysis of variance." J. Royal Statistics Society 22, pp. 392–399, 1973.