Least-Squares Approximation by Natural Cubic Splines
The construction of a least-squares approximant usually requires that one have in hand a basis for the space from which the data are to be approximated. As the example of the space of “natural” cubic splines illustrates, the explicit construction of a basis is not always straightforward.
This section makes clear that an explicit basis is not actually needed; it is sufficient to have available some means of interpolating in some fashion from the space of approximants. For this, the fact that the Curve Fitting Toolbox™ spline functions support work with vector-valued functions is essential.
This section discusses these aspects of least-squares approximation by “natural” cubic splines.
Problem
You want to construct the least-squares approximation to given
data (x
,y
) from the space S of
“natural” cubic splines with given breaks b(1)
<
...< b(l+1)
.
General Resolution
If you know a basis, (f1,f2,...,fm), for the linear space S of
all “natural” cubic splines with break sequence b
,
then you have learned to find the least-squares approximation in the
form c(1)
f1+ c(2)
f2+ ... + c(m)
fm,
with the vector c
the least-squares solution to
the linear system A*c = y
, whose coefficient matrix
is given by
A(i,j) = fj(x(i)), i=1:length(x), j=1:m .
In other words, c = A\y
.
Need for a Basis Map
The general solution seems to require that you know a basis.
However, in order to construct the coefficient sequence c
,
you only need to know the matrix A
. For this, it
is sufficient to have at hand a basis map, namely a function F
say,
so that F(c)
returns the spline given by the particular
weighted sum c(1)
f1+c(2)
f2+
... +c(m)
fm.
For, with that, you can obtain, for j=1:m
, the j
-th
column of A
as fnval(F(ej),x)
,
with ej
the j
-th column of eye(m)
,
the identity matrix of order m
.
Better yet, the Curve Fitting Toolbox spline functions can
handle vector-valued functions, so you should
be able to construct the basis map F
to handle
vector-valued coefficients c(i)
as well. However,
by agreement, in this toolbox, a vector-valued coefficient is a column vector,
hence the sequence c is necessarily a row vector of column vectors,
i.e., a matrix. With that, F(eye(m))
is
the vector-valued spline whose i
-th component is
the basis element fi
, i=1:m
.
Hence, assuming the vector x
of data sites to be
a row vector, fnval(F(eye(m)),x)
is the matrix
whose (i,j)
-entry is the value of fi
at x(j)
,
i.e., the transpose of the matrix A
you
are seeking. On the other hand, as just pointed out, your basis map F
expects
the coefficient sequence c
to be a row vector,
i.e., the transpose of the vector A\y
. Hence,
assuming, correspondingly, the vector y
of data
values to be a row vector, you can obtain the least-squares approximation
from S to data (x
,y
)
as
F(y/fnval(F(eye(m)),x))
To be sure, if you wanted to be prepared for x
and y
to
be arbitrary vectors (of the same length), you would use instead
F(y(:).'/fnval(F(eye(m)),x(:).'))
A Basis Map for “Natural” Cubic Splines
What exactly is required of a basis map F
for
the linear space S of “natural” cubic
splines with break sequence b(1) < ... < b(l+1)
?
Assuming the dimension of this linear space is m
,
the map F
should set up a linear one-to-one correspondence
between m
-vectors and elements of S.
But that is exactly what csape(b, . ,'var')
does.
To be explicit, consider the following function F
:
function s = F(c) s = csape(b,c,'var');
For given vector c
(of the same length as
b), it provides the unique “natural”
cubic spline with break sequence b that takes the value c(i)
at b(i)
, i=1:l+1
.
The uniqueness is key. It ensures that the correspondence between
the vector c
and the resulting spline F(c)
is
one-to-one. In particular, m
equals length(b)
.
More than that, because the value f(t)
of a function f at a point t depends
linearly on f, this uniqueness ensures that F(c)
depends
linearly on c
(because c
equals fnval(F(c),b)
and
the inverse of an invertible linear map is again a linear map).
The One-line Solution
Putting it all together, you arrive at the following code
csape(b,y(:).'/fnval(csape(b,eye(length(b)),'var'),x(:).'),... 'var')
for the least-squares approximation by “natural”
cubic splines with break sequence b
.
The Need for Proper Extrapolation
Let's try it on some data, the census data, say, which is provided in MATLAB® by the command
load census
and which supplies the years, 1790:10:1990
,
as cdate
and the values as pop
.
Use the break sequence 1810:40:1970
.
b = 1810:40:1970; s = csape(b, ... pop(:)'/fnval(csape(b,eye(length(b)),'var'),cdate(:)'),'var'); fnplt(s, [1750,2050],2.2); hold on plot(cdate,pop,'or'); hold off
Have a look at Least-Squares Approximation by “Natural” Cubic Splines With Three Interior Breaks which shows, in thick blue, the resulting approximation, along with the given data.
This looks like a good approximation, -- except that it doesn't look like a “natural” cubic spline. A “natural” cubic spline, to recall, must be linear to the left of its first break and to the right of its last break, and this approximation satisfies neither condition. This is due to the following facts.
The “natural” cubic spline interpolant to given
data is provided by csape
in ppform, with the interval
spanned by the data sites its basic interval. On the other hand, evaluation
of a ppform outside its basic interval is done, in MATLAB ppval
or Curve Fitting Toolbox spline
function fnval
, by using the relevant polynomial
end piece of the ppform, i.e., by full-order extrapolation. In case
of a “natural” cubic spline, you want instead second-order
extrapolation. This means that you want, to the left of the first
break, the straight line that agrees with the cubic spline in value
and slope at the first break. Such an extrapolation is provided by fnxtr
. Because the “natural” cubic
spline has zero second derivative at its first break, such an extrapolation
is even third-order, i.e., it satisfies three matching conditions.
In the same way, beyond the last break of the cubic spline, you want
the straight line that agrees with the spline in value and slope at
the last break, and this, too, is supplied by fnxtr
.
Least-Squares Approximation by “Natural” Cubic Splines With Three Interior Breaks
The Correct One-Line Solution
The following one-line code provides the correct least-squares
approximation to data (x
,y
)
by “natural” cubic splines with break sequence b
:
fnxtr(csape(b,y(:).'/ ... fnval(fnxtr(csape(b,eye(length(b)),'var')),x(:).'),'var'))
But it is, admittedly, a rather long line.
The following code uses this correct formula and plots, in a thinner, red line, the resulting approximation on top of the earlier plots, as shown in Least-Squares Approximation by “Natural” Cubic Splines With Three Interior Breaks.
ss = fnxtr(csape(b,pop(:)'/ ... fnval(fnxtr(csape(b,eye(length(b)),'var')),cdate(:)'),'var')); hold on, fnplt(ss,[1750,2050],1.2,'r'),grid, hold off legend('incorrect approximation','population', ... 'correct approximation')
Least-Squares Approximation by Cubic Splines
The one-line solution works perfectly if you want to approximate
by the space S of all cubic splines with the given
break sequence b
. You don't even have to use the Curve Fitting Toolbox spline
functions for this because you can rely on the MATLAB spline
.
You know that, with c
a sequence containing two
more entries than does b
, spline(b,c)
provides
the unique cubic spline with break sequence b
that
takes the value c(i+1)
at b(i)
,
all i
, and takes the slope c(1)
at b(1)
,
and the slope c(end)
at b(end)
.
Hence, spline(b,.)
is a basis map for S
.
More than that, you know that spline(b,c,xi)
provides
the value(s) at xi
of this interpolating spline.
Finally, you know that spline
can handle vector-valued
data. Therefore, the following one-line code constructs the least-squares
approximation by cubic splines with break sequence b
to
data (x
,y
) :
spline(b,y(:)'/spline(b,eye(length(b)),x(:)'))