Sample covariance matrix

The average of N given real numbers x^{(1)},ldots,x^{(N)} is the quantity hat{x} := (1/N)(x^{(1)}+ldots+x^{(N)}). Their /variance} is the non-negative number sigma_x^2 := (1/N) sum_{k=1}^N (x^{(k)}-hat{x})^2, and is a measure of the variability of the numbers around their average; in particular, the variance is zero if and only if all the numbers are equal.

Now consider two series of real numbers, x^{(k)}, y^{(k)}, k=1,ldots,N. We can define the means hat{x},hat{y} and variances sigma_x^2,sigma_y^2 of these numbers as before. In addition, we can define the /covariance} between these two collections of numbers as the value
 sigma_{xy}^2 := (1/N) sum_{k=1}^N (x^{(k)} -hat{x}) (y^{(k)}-hat{y}).
The covariance is a measure of closeness between the two series of numbers: if the covariance is large and positive, then in average, when x^{(k)}’s are far above or below their mean, then so is y^{(k)}; the series tend to move in the same direction simultaneously. If the covariance is large and negative, then the two series tend to move in opposite directions. If the covariance is zero, then the excursions of the x-series above or below its mean tend to compensate those of the y-series. The symmetric matrix
 Sigma_{xy} := left(begin{array}{cc} sigma_x^2 & sigma_{xy}^2  sigma_{xy}^2 & sigma_{y}^2 end{array}right)
is called the covariance matrix associated with the two series x,y.

Consider now a collection of vectors x^{(k)} in mathbf{R}^n, k=1,ldots,N. We can define, as before, the average as the vector hat{x} = (1/N) sum_{i=1}^N x^{(i)}. We then define the covariance matrix as the n times n matrix Sigma formed with the covariance between the series x_i^{(k)}, y_j^{(k)}, k=1,ldots,N, for i,j =1,ldots,n. More compactly:
 Sigma := frac{1}{N} sum_{k=1}^N (x^{(k)}-hat{x})(x^{(k)}-hat{x})^T .
A diagonal element of Sigma, Sigma_{ii}, captures the individual variance of the i-th component of the vectors, which is the series x_i^{(k)}, k=1,ldots,N. The covariance matrix is symmetric by construction. In addition, it is positive semi-definite, since it is a non-negative sum of symmetric dyads.