Documentation

## Data with Missing Values

Many data sets have one or more missing values. It is convenient to code missing values as NaN (Not a Number) to preserve the structure of data sets across multiple variables and observations.

Normal MATLAB® arithmetic operations yield NaN values when operands are NaN. Removing the NaN values would destroy the matrix structure. Removing the rows containing the NaN values would discard data. Statistics and Machine Learning Toolbox™ functions in the following table remove NaN values only for the purposes of computation.

FunctionDescription
nancov

Covariance matrix, ignoring NaN values

nanmax

Maximum, ignoring NaN values

nanmean

Mean, ignoring NaN values

nanmedian

Median, ignoring NaN values

nanmin

Minimum, ignoring NaN values

nanstd

Standard deviation, ignoring NaN values

nansum

Sum, ignoring NaN values

nanvar

Variance, ignoring NaN values

Other Statistics and Machine Learning Toolbox functions also ignore NaN values. These include iqr, kurtosis, mad, prctile, range, skewness, and trimmean.

### Working with Data with Missing Values

Create a 3-by-3 matrix of sample data. Remove two data values by replacing them with NaN.

X = magic(3);
X([1 5]) = [NaN NaN]
X = 3×3

NaN     1     6
3   NaN     7
4     9     2

Compute the sum of for each column of the sample data matrix using the sum function.

s1 = sum(X)
s1 = 1×3

NaN   NaN    15

If a column contains a NaN value, then the sum function will return NaN as the sum of the data in that column.

For comparison, compute the sum for each column of the sample data matrix using the nansum function.

s2 = nansum(X)
s2 = 1×3

7    10    15

If a column contains a NaN value, then the nansum function ignores the NaN value and returns the sum of the remaining values in the column.