Main Content

# dummyvar

Create dummy variables

## Syntax

``D = dummyvar(group)``

## Description

example

````D = dummyvar(group)` returns a matrix `D` containing zeros and ones, whose columns are dummy variables for the grouping variables in `group`. Each column of `group` is a single grouping variable, with values indicating category levels. The rows of `group` represent observations across all variables.```

## Examples

collapse all

Create a column vector of categorical data specifying color types.

```Colors = {'Red';'Blue';'Green';'Red';'Green';'Blue'}; Colors = categorical(Colors);```

Create dummy variables for each color type.

`D = dummyvar(Colors)`
```D = 6×3 0 0 1 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0 ```

The columns in `D` correspond to the levels in `Colors`. For example, the first column of `dummyvar` corresponds to the first level, `'Blue'`, in `Colors`.

Display the category levels of `Colors`.

`categories(Colors)`
```ans = 3x1 cell {'Blue' } {'Green'} {'Red' } ```

Create a matrix `group` of data containing the effects of two machines and three operators on a process.

```machine = [1 1 1 1 2 2 2 2]'; operator = [1 2 3 1 2 3 1 2]'; group = [machine operator]```
```group = 8×2 1 1 1 2 1 3 1 1 2 2 2 3 2 1 2 2 ```

Create dummy variables of the data in `group`.

`D = dummyvar(group)`
```D = 8×5 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 0 0 1 0 1 0 ```

The first two columns of `D` represent observations of machine 1 and machine 2, respectively. The remaining columns represent observations of the three operators.

Create a cell array of phone types and a numeric vector of area codes.

```phone = {'mobile';'landline';'mobile';'mobile';'mobile';'landline';'landline'}; codes = [802 802 603 603 802 603 802]';```

Because the area code data has two levels (rather than 802 levels corresponding to the integers `1:802`), convert `codes` to a categorical vector.

`newcodes = categorical(codes);`

Combine the `phone` and `newcodes` grouping variables into the cell array `group`.

`group = {phone,newcodes};`

Create dummy variables for the groups in `group`.

`D = dummyvar(group)`
```D = 7×4 1 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 ```

The first two columns of `D` correspond to the phone types, and the last two columns correspond to the area codes.

## Input Arguments

collapse all

Grouping variables, specified as a positive integer vector or categorical column vector representing levels within a single variable, a cell array containing one or more grouping variables, or a positive integer matrix representing levels within multiple variables.

If `group` is a categorical vector, then the groups and their order match the output of the `categories` function applied to `group`. If `group` is a numeric vector, then `dummyvar` assumes that the groups and their order are `1:max(group)`. In this respect, `dummyvar` treats a numeric grouping variable differently from `grp2idx`. For information on the order of groups within grouping variables, see Grouping Variables.

Example: `[2 1 1 1 2 3 3 2]'`

Example: `{Origin,Cylinders}`

Data Types: `single` | `double` | `categorical` | `cell`

## Output Arguments

collapse all

Dummy variables, returned as an n-by-s numeric matrix, where n is the number of rows of `group` and s is the sum of the number of levels in each column of `group`. From left to right, the columns of `D` are dummy variables created from the first column of `group`, followed by dummy variables created from the second column of `group`, and so on.

Data Types: `single` | `double`

## Tips

• Use dummy variables in regression analysis and ANOVA to indicate values of categorical predictors.

• `dummyvar` treats `NaN` values and undefined categorical levels in `group` as missing data and returns `NaN` values in `D`.

• If a column of ones is introduced in the matrix `D`, then the resulting matrix `X = [ones(size(D,1),1) D]` is rank deficient. If `group` has multiple columns, then the matrix `D` itself is rank deficient because dummy variables produced from any column of `group` always sum to a column of ones. Regression and ANOVA calculations often address this issue by eliminating one dummy variable (implicitly setting the coefficients for dropped columns to zero) from each group of dummy variables produced by a column of `group`.

• If `group` is a numeric vector with levels that do not correspond exactly to the integers `1:max(group)`, first convert the data to a categorical vector by using `categorical`. You can then pass the result to `dummyvar`. For an example, see Create Dummy Variables from Multiple Grouping Variables.

## See Also

### Topics

Introduced before R2006a

Download ebook