Linear Functions and Maps

  • Linear and affine functions

  • Linear and affine maps

  • First-order approximation of non-linear functions

  • Other sources of linear models

Linear and affine functions

Definition

Linear functions are functions which preserve scaling and addition of the input argument. Affine functions are ‘‘linear plus constant’’ functions.

Formal definition, linear and affine functions. A function f: mathbf{R}^n rightarrow mathbf{R} is linear if and only if f preserves scaling and addition of its arguments:

  • for every x in mathbf{R}^n, and alpha in mathbf{R}, f(alpha x) = alpha f(x); and

  • for every x_1,x_2 in mathbf{R}^n, f(x_1+x_2) = 		f(x_1)+f(x_2). A function f is affine if and only if the function tilde{f}: mathbf{R}^n rightarrow mathbf{R} with values tilde{f}(x) = f(x)-f(0) is affine. spadesuit

An alternative characterization of linear functions is given here.

Examples: Consider the functions f_1,f_2,f_3 : mathbf{R}^2 rightarrow mathbf{R} with values

  1. f_1(x) = 3.2 x_1 + 2x_2,

  2. f_2(x) = 3.2x_1 + 2x_2 +0.15,

  3. f_3(x) = 0.001x_2^2 + 2.3x_1+0.3x_2.

The function f_1 is linear; f_2 is affine; and f_3 is neither. diamondsuit

Connection with vectors via the scalar product

The following shows the connection between linear functions and scalar products.

Theorem: Representation of affine function via the scalar product. A function f: mathbf{R}^n rightarrow mathbf{R} is affine if and only if it can be expressed via a scalar product:
 f(x) = a^Tx+b ,
for some unique pair (a,b), with a in mathbf{R}^{n} and b in mathbf{R}. The function is linear if and only if b = 0. spadesuit

The theorem shows that a vector can be seen as a (linear) function from the ‘‘input“ space mathbf{R}^n to the ‘‘output” space mathbf{R}. Both points of view (matrices as simple collections of numbers, or as linear functions) are useful.

Gradient of an affine function

The gradient of a function f : mathbf{R}^n rightarrow mathbf{R} at a point x, denoted nabla f(x), is the vector of first derivatives with respect to x_1,ldots,x_n (see here for a formal definition and examples).

An affine function f : mathbf{R}^n rightarrow mathbf{R}, with values f(x) = a^Tx+b has a very simple gradient: the constant vector a. That is, for an affine function f, we have for every x:
 nabla f(x) = a.

Interpretations

The interpretation of a,b are as follows.

  • The b=f(0) is the constant term. For this reason, it is sometimes referred to as the bias, or intercept (as it is the point where f intercepts the vertical axis if we were to plot the graph of the function).

  • The terms a_j, j=1,ldots,n give the coefficients of influence of x_j on f. For example, if a_1 >> a_3, then the first component of x has much greater influence on the value of f(x) than the third.

Example: Beer-Lambert law in absorption spectrometry.

Linear and affine maps

Definition

A map f: mathbf{R}^n rightarrow mathbf{R}^m is affine if and only if every one of its components is. The formal definition we saw above for functions applies verbatim to maps (see here).

To an m times n matrix A, we can associate a linear map f : mathbf{R}^n rightarrow mathbf{R}^m, with values f(x) = Ax. Conversely, to any linear map, we can uniquely associate a matrix A which satisfies f(x) = Ax for every x.

Indeed, if the components of f, f_i, i=1,ldots,m, are linear, then they can be expressed as f_i(x) = a_i^Tx for some a_i in mathbf{R}^n. The matrix A is the matrix that has a_i^T as its i-th row:
 f(x) = left(begin{array}{c} a_1^Tx  vdots a_n^Tx end{array}right) = Ax, ;; mbox{ with } A := left(begin{array}{c} a_1^T vdots a_m^T end{array}right) in mathbf{R}^{m times n}.
Hence, there is a one-to-one correspondence between matrices and linear maps.

This is summarized as follows.

Theorem: Representation of affine maps via the matrix-vector product. A function f: mathbf{R}^n rightarrow mathbf{R}^m is affine if and only if it can be expressed via a matrix-vector product:
 f(x) = Ax+b,
for some unique pair (A,b), with A in mathbf{R}^{m times n} and b in mathbf{R}^m. The function is linear if and only if b = 0. spadesuit

The theorem shows that a matrix can be seen as a (linear) map from the ‘‘input“ space mathbf{R}^n to the ‘‘output” space mathbf{R}^m. Both points of view (matrices as simple collections of vectors, or as linear maps) are useful.

Interpretations

Consider an affine map x rightarrow y = Ax+b. An element A_{ij} gives the coefficient of influence of x_j over y_i. In this sense, if A_{13} >>A_{14} we can say that x_3 has much more influence on y_1 than x_4. Or, A_{24} = 0 says that y_2 does not depend at all on x_4. Often the constant term b = f(0) is referred to as the ‘‘bias’’ vector.

First-order approximation of non-linear functions

Many maps are non-linear. A common engineering practice is to approximate a given non-linear map with a linear (or affine) one, by taking derivatives. This is the main reason for linearity to be such an ubiquituous tool in Engineering.

One-dimensional case

Consider a function of one variable f : mathbf{R} rightarrow mathbf{R}, and assume it is differentiable everywhere. Then we can approximate the values function at a point x near a point x_0 as follows:
 f(x) simeq l(x) := f(x_0) + f'(x_0) (x-x_0) ,
where f'(x) denotes the derivative of f at x.

Multi-dimensional case

With more than one variable, we have a similar result. Let us approximate a differentiable function f : mathbf{R}^n rightarrow mathbf{R} by a linear function l, so that f and l coincide up and including to the first derivatives. The corresponding approximation l is called the first-order approximation to f at x_0.

The approximate function l must be of the form
 l(x) = a^Tx + b,
where a in mathbf{R}^n and b inmathbf{R}. Our condition that l coincides with f up and including to the first derivatives shows that we must have
 nabla l(x) = a = nabla f(x_0), ;; a^Tx_0 + b = f(x_0),
where nabla f(x_0) the gradient, of f at x_0. Solving for a,b we obtain the following result:

Theorem: First-order expansion of a function. The first-order approximation of a differentiable function f at a point x_0 is of the form
 f(x) approx l(x) = f(x_0) + nabla f(x_0)^T (x-x_0)
where nabla f(x_0) in mathbf{R}^n is the gradient of f at x_0. clubsuit

The case of maps

The above can be extended to maps. If f : mathbf{R}^n rightarrow mathbf{R}^m is differentiable, then we can approximate the values of f near a given point x_0 in mathbf{R}^n by an affine function tilde{f}:
 f(x) approx tilde{f} (x) : = f(x_0) + A (x-x_0) ,
where A_{ij} = frac{partial f_i}{partial x_j}(x_0) is the derivative of the i-th component of f with respect to x_j. (A is referred to as the Jacobian of f atx_0.)

Example: Navigation by range measurement.

Other sources of linear models

Change of variables

Linearity can arise from a simple change of variables. This is best illustrated with a specific example.

Example: Power laws.

Linear dynamical systems

Many linear systems arise from models of dynamical systems, which describe how the state of a system evolves over time. The state is in general a vector fully describing the system at a given time (comprising say the temperature at several locations of a furnace, or the positions and velocities of several locations on a bridge, or various economic indicators, etc). A linear dynamical model postulates that the state at the next instant is a linear function of the state at past instants, and possibly other ‘‘exogeneous’’ inputs. Such systems can result from the linearization of a non-linear dynamical model.

Example: Population dynamics.