Sample covariance matrix
Definition
For a vector , the sample variance measures the average deviation of its coefficients around the sample average :
Now consider a matrix , where each column represents a data point in . We are interested in describing the amount of variance in this data set. To this end, we look at the numbers we obtain by projecting the data along a line defined by the direction . This corresponds to the (row) vector in
The corresponding sample mean and variance are
where is the sample mean of the vectors .
The sample variance along direction can be expressed as a quadratic form in :
where is a symmetric matrix, called the sample covariance matrix of the data points:
Properties
The covariance matrix satisfies the following properties.
The sample covariance matrix allows to find the variance along any direction in data space.
The diagonal elements of give the variances of each vector in the data.
The trace of gives the sum of all the variances.
The matrix is positive semi-definite, since the associated quadratic form is non-negative everywhere.
Matlab syntax
The following matlab syntax assumes that the data points in are collected in a matrix : .
Matlab syntax
>> xhat = mean(X,2); % mean of columns of matrix X
>> Xc = X-xhat*ones(1,m); % centered data matrix
>> Sigma = (1/m)*Xc'*Xc; % covariance matrix
>> Sigma = cov(X',1); % built-in command produces the same thing
|