Least-Squares Approximation by Natural Cubic Splines

The construction of a least-squares approximant usually requires that one have in hand a basis for the space from which the data are to be approximated. As the example of the space of “natural” cubic splines illustrates, the explicit construction of a basis is not always straightforward.

This section makes clear that an explicit basis is not actually needed; it is sufficient to have available some means of interpolating in some fashion from the space of approximants. For this, the fact that the Curve Fitting Toolbox™ spline functions support work with vector-valued functions is essential.

This section discusses these aspects of least-squares approximation by “natural” cubic splines.

Problem

You want to construct the least-squares approximation to given data (x,y) from the space S of “natural” cubic splines with given breaks b(1) < ...< b(l+1).

General Resolution

If you know a basis, (f1,f2,...,fm), for the linear space S of all “natural” cubic splines with break sequence b, then you have learned to find the least-squares approximation in the form c(1)f1+ c(2)f2+ ... + c(m)fm, with the vector c the least-squares solution to the linear system A*c = y, whose coefficient matrix is given by

A(i,j) = fj(x(i)),   i=1:length(x),  j=1:m .

In other words, c = A\y.

Need for a Basis Map

The general solution seems to require that you know a basis. However, in order to construct the coefficient sequence c, you only need to know the matrix A. For this, it is sufficient to have at hand a basis map, namely a function F say, so that F(c) returns the spline given by the particular weighted sum c(1)f1+c(2)f2+... +c(m)fm. For, with that, you can obtain, for j=1:m, the j-th column of A as fnval(F(ej),x), with ej the j-th column of eye(m), the identity matrix of order m.

Better yet, the Curve Fitting Toolbox spline functions can handle vector-valued functions, so you should be able to construct the basis map F to handle vector-valued coefficients c(i) as well. However, by agreement, in this toolbox, a vector-valued coefficient is a column vector, hence the sequence c is necessarily a row vector of column vectors, i.e., a matrix. With that, F(eye(m)) is the vector-valued spline whose i-th component is the basis element fi, i=1:m. Hence, assuming the vector x of data sites to be a row vector, fnval(F(eye(m)),x) is the matrix whose (i,j)-entry is the value of fi at x(j), i.e., the transpose of the matrix A you are seeking. On the other hand, as just pointed out, your basis map F expects the coefficient sequence c to be a row vector, i.e., the transpose of the vector A\y. Hence, assuming, correspondingly, the vector y of data values to be a row vector, you can obtain the least-squares approximation from S to data (x,y) as

F(y/fnval(F(eye(m)),x))

To be sure, if you wanted to be prepared for x and y to be arbitrary vectors (of the same length), you would use instead

F(y(:).'/fnval(F(eye(m)),x(:).'))

A Basis Map for “Natural” Cubic Splines

What exactly is required of a basis map F for the linear space S of “natural” cubic splines with break sequence b(1) < ... < b(l+1)? Assuming the dimension of this linear space is m, the map F should set up a linear one-to-one correspondence between m-vectors and elements of S. But that is exactly what csape(b, . ,'var') does.

To be explicit, consider the following function F:

function s = F(c)
s = csape(b,c,'var');

For given vector c (of the same length as b), it provides the unique “natural” cubic spline with break sequence b that takes the value c(i) at b(i), i=1:l+1. The uniqueness is key. It ensures that the correspondence between the vector c and the resulting spline F(c) is one-to-one. In particular, m equals length(b). More than that, because the value f(t) of a function f at a point t depends linearly on f, this uniqueness ensures that F(c) depends linearly on c (because c equals fnval(F(c),b) and the inverse of an invertible linear map is again a linear map).

The One-line Solution

Putting it all together, you arrive at the following code

csape(b,y(:).'/fnval(csape(b,eye(length(b)),'var'),x(:).'),...
'var')

for the least-squares approximation by “natural” cubic splines with break sequence b.

The Need for Proper Extrapolation

Let's try it on some data, the census data, say, which is provided in MATLAB^® by the command

load census

and which supplies the years, 1790:10:1990, as cdate and the values as pop. Use the break sequence 1810:40:1970.

b = 1810:40:1970;
s = csape(b, ...
pop(:)'/fnval(csape(b,eye(length(b)),'var'),cdate(:)'),'var');
fnplt(s, [1750,2050],2.2);
hold on
plot(cdate,pop,'or');
hold off

Have a look at Least-Squares Approximation by “Natural” Cubic Splines With Three Interior Breaks which shows, in thick blue, the resulting approximation, along with the given data.

This looks like a good approximation, -- except that it doesn't look like a “natural” cubic spline. A “natural” cubic spline, to recall, must be linear to the left of its first break and to the right of its last break, and this approximation satisfies neither condition. This is due to the following facts.

The “natural” cubic spline interpolant to given data is provided by csape in ppform, with the interval spanned by the data sites its basic interval. On the other hand, evaluation of a ppform outside its basic interval is done, in MATLAB ppval or Curve Fitting Toolbox spline function fnval, by using the relevant polynomial end piece of the ppform, i.e., by full-order extrapolation. In case of a “natural” cubic spline, you want instead second-order extrapolation. This means that you want, to the left of the first break, the straight line that agrees with the cubic spline in value and slope at the first break. Such an extrapolation is provided by fnxtr. Because the “natural” cubic spline has zero second derivative at its first break, such an extrapolation is even third-order, i.e., it satisfies three matching conditions. In the same way, beyond the last break of the cubic spline, you want the straight line that agrees with the spline in value and slope at the last break, and this, too, is supplied by fnxtr.

Least-Squares Approximation by “Natural” Cubic Splines With Three Interior Breaks

The Correct One-Line Solution

The following one-line code provides the correct least-squares approximation to data (x,y) by “natural” cubic splines with break sequence b:

fnxtr(csape(b,y(:).'/ ...
    fnval(fnxtr(csape(b,eye(length(b)),'var')),x(:).'),'var'))

But it is, admittedly, a rather long line.

The following code uses this correct formula and plots, in a thinner, red line, the resulting approximation on top of the earlier plots, as shown in Least-Squares Approximation by “Natural” Cubic Splines With Three Interior Breaks.

 ss = fnxtr(csape(b,pop(:)'/ ...
  fnval(fnxtr(csape(b,eye(length(b)),'var')),cdate(:)'),'var'));
hold on, fnplt(ss,[1750,2050],1.2,'r'),grid, hold off
legend('incorrect approximation','population', ...
'correct approximation')

Least-Squares Approximation by Cubic Splines

The one-line solution works perfectly if you want to approximate by the space S of all cubic splines with the given break sequence b. You don't even have to use the Curve Fitting Toolbox spline functions for this because you can rely on the MATLAB spline. You know that, with c a sequence containing two more entries than does b, spline(b,c) provides the unique cubic spline with break sequence b that takes the value c(i+1) at b(i), all i, and takes the slope c(1) at b(1), and the slope c(end) at b(end). Hence, spline(b,.) is a basis map for S.

More than that, you know that spline(b,c,xi) provides the value(s) at xi of this interpolating spline. Finally, you know that spline can handle vector-valued data. Therefore, the following one-line code constructs the least-squares approximation by cubic splines with break sequence b to data (x,y) :

spline(b,y(:)'/spline(b,eye(length(b)),x(:)'))