By Cleve Moler, MathWorks
When I first heard about compressed sensing, I was skeptical. There were claims that it reduced the amount of data required to represent signals and images by huge factors and then restored the originals exactly. I knew from the Nyquist-Shannon sampling theorem that this is impossible. But after learning more about compressed sensing, I’ve come to realize that, under the right conditions, both the claims and the theorem are true.
The Nyquist-Shannon sampling theorem states that to restore a signal exactly and uniquely, you need to have sampled with at least twice its frequency. Of course, this theorem is still valid; if you skip one byte in a signal or image of white noise, you can’t restore the original. But most interesting signals and images are not white noise. When represented in terms of appropriate basis functions, such as trig functions or wavelets, many signals have relatively few non-zero coefficients. In compressed (or compressive) sensing terminology, they are sparse.
Naturally, the aspect of compressed sensing that first caught my attention was the underlying matrix computation. A raw signal or image can be regarded as a vector \(f\) with millions of components. We assume that \(f\) can be represented as a linear combination of certain basis functions:
\[f = \Psi c\]
The basis functions must be suited to a particular application. In our example, \(\Psi\) is the discrete cosine transform. We also assume that most of the coefficients \(c\) are effectively zero, so that \(c\) is sparse. Sampling the signal involves another linear operator,
\[b = \Phi f\]
In our example, \(b\) is a few random samples of \(f\), so \(\Phi\) is a subset of the rows of the identity operator. But more complicated sampling operators are possible.
To reconstruct the signal, we must try to recover the coefficients by solving
\[Ax = b\text{, where }A = \Phi \Psi\]
Once we have the coefficients, we can recover the signal itself by computing
\[f \approx \Psi x\]
Since this is a compression, \(A\) is rectangular, with many more columns than rows. Computing the coefficients \(x\) involves solving an underdetermined system of simultaneous linear equations, \(Ax = b\). In this situation, there are many more unknowns than equations. The key to the almost magical reconstruction process is to impose a nonlinear regularization involving the \(\ell_{1}\) (that’s “ell-sub-1”) norm.
I described a tiny version of this situation in a 1990 Cleve’s Corner column entitled “The World’s Simplest Impossible Problem.” I am thinking of two numbers whose average is 3. What are the numbers? After complaining that I haven’t given you enough information, you might answer 2 and 4. If you do, you have unconsciously imposed a kind of regularization that requires the result to be two distinct integers.
MATLAB^{®} can provide two different answers. Algebraically, the problem is a 1-by-2 system of linear equations with matrix
A = [1/2 1/2]
and right-hand side
b = 3
We want to find a 2-vector \(y\) that solves \(Ay = b\). The minimum norm least squares solution is computed by the pseudoinverse,
y = pinv(A)*b y = 3 3
But backslash generates a different solution:
x = A\b x = 6 0
Both solutions are valid, but human puzzle-solvers rarely mention them. Notice that the solution computed by backslash is sparse; one of its components is zero.
The signal or image restoration problem is a larger instance of the same task. We are given thousands of weighted averages of millions of signal or pixel values. Our job is to re-generate the original signal or image. We have the compressed sample, \(b\), and we need to solve \(Ax = b\). There is a huge number of possible solutions. The basis for picking the right one involves vector norms. The familiar Euclidean distance norm, \(\ell_{2}\), is
norm(x,2) = sqrt(sum(x.^2))
The “Manhattan” norm, \(\ell_{1}\), is named after travel time along the square grid formed by city streets.
norm(x,1) = sum(abs(x))
The notation \(\ell_{0}\) is used for a function that only counts the number of nonzero components. It’s actually not a norm.
norm0(x) = sum(x~=0)
These norms impose three different nonlinear regularizations, or constraints, on our underdetermined linear system.
From all \(x\) that satisfy \(Ax = b\), find an \(x\) that minimizes \(\ell_{p}(x)\)
For our 20-year-old example, \(\ell_{2}(y)\) is less than \(\ell_{2}(x)\), and \(\ell_{1}(y)\) and \(\ell_{1}(x)\) happen to be equal, but \(\ell_{0}(y)\) is greater than \(\ell_{0}(x)\).
The keys to compressed sensing are sparsity and the \(\ell_{1}\) norm. If the expansion of the original signal or image as a linear combination of the selected basis functions has many zero coefficients, then it’s often possible to reconstruct the signal exactly. In principle, computing this reconstruction should involve counting nonzeros with\(\ell_{0}\). This is a combinatorial problem whose computational complexity makes it impractical. (It’s NP-hard.) Fortunately, as David Donoho, Emmanuel Candés, and Terence Tao have shown, \(\ell_{0}\) can be replaced by \(\ell_{1}\). They explain that “with overwhelming probability” the two problems have the same solution. The \(\ell_{1}\) computation is practical because it can be posed as a linear programming problem and solved with the traditional simplex algorithm or modern interior point methods.
Our example uses the discrete cosine transform (DCT) as the basis. The signal generated by the “A
” key on a touch-tone telephone is the sum of two sinusoids with incommensurate frequencies,
\[ f(t) = \sin(1394\pi t) + \sin(3266\pi t) \]
If we sample this tone for 1/8 of a second at 40000 Hz, the result is a vector \(f\) of length \(n = 5000\). The upper plot in Figure 1 shows a portion of this signal, together with some of the \(m = 500\) random samples that we have taken. The lower plot shows the coefficients \(c\), obtained by taking the inverse discrete cosine transform of \(f\), with two spikes at the appropriate frequencies. Because the two frequencies are incommensurate, this signal does not fall exactly within the space spanned by the DCT basis functions, and so there are a few dozen significant nonzero coefficients.
For our example, the condensed signal is a vector \(b\) of \(m\) random samples of the original signal. We construct a matrix \(A\) by extracting \(m\) rows from the \(n \times n\) DCT matrix
D = dct(eye(n,n)); A = D(k,:)
where k
is the vector of indices used for the sample b
. The resulting linear system, \(Ax = b\), is \(m \times n\), which is 500-by-5000. There are 10 times as many unknowns as equations.
To reconstruct the signal, we need to find the solution to \(Ax = b\) that minimizes the \(\ell_{1}\) norm of \(x\). This is a nonlinear optimization problem, and there are several MATLAB based programs available to solve it. I have chosen to use \(\ell_{1}\)-magic, written by Justin Romberg and Emmanuel Candès when they were at Caltech. The upper plot in Figure 2 shows the resulting solution, \(x\). We see that it has relatively few large components and that it closely resembles the DCT of the original signal. Moreover, the discrete cosine transform of \(x\), shown in the lower plot, closely resembles the original signal. If audio was available, you would be able to hear that the two commands sound(f)
and sound(dct(x))
are nearly the same.
For comparison, Figure 3 shows the traditional \(\ell_{2}\), or least squares, solution to \(Ay = b\), computed by
y = pinv(A)*b
There is a slight hint of the original signal in the plots, but sound(dct(y))
is just a noisy buzz.
It is too early to predict when, or if, we might see compressed sensing in our cell phones, digital cameras, and magnetic resonance scanners, but I find the underlying mathematics and software fascinating.
Compressed sensing promises, in theory, to reconstruct a signal or image from surprisingly few samples. Discovered just five years ago by Candès and Tao and by Donoho, the subject is a very active research area. Practical devices that implement the theory are just now being developed.
It is important to realize that compressed sensing can be done only by a compressing sensor, and that it requires new recording technology and file formats. The MP3 and JPEG files used by today’s audio systems and digital cameras are already compressed in such a way that exact reconstruction of the original signals and images is impossible. Some of the Web postings and magazine articles about compressed sensing fail to acknowledge this fact.
91850v00
A twenty-year-old Cleve’s Corner:
Two survey papers for general technical audiences:
One of many tutorials:
MATLAB based software:
The two original papers:
An audio demonstration: