Perceptron Neural Networks

Rosenblatt [Rose61] created many variations of the perceptron. One of the simplest was a single-layer network whose weights and biases could be trained to produce a correct target vector when presented with the corresponding input vector. The training technique used is called the perceptron learning rule. The perceptron generated great interest due to its ability to generalize from its training vectors and learn from initially randomly distributed connections. Perceptrons are especially suited for simple problems in pattern classification. They are fast and reliable networks for the problems they can solve. In addition, an understanding of the operations of the perceptron provides a good basis for understanding more complex networks.

The discussion of perceptrons in this section is necessarily brief. For a more thorough discussion, see Chapter 4, “Perceptron Learning Rule,” of [HDB1996], which discusses the use of multiple layers of perceptrons to solve more difficult problems beyond the capability of one layer.

Neuron Model

A perceptron neuron, which uses the hard-limit transfer function hardlim, is shown below.

Diagram of a perceptron neuron showing input data being multiplied by individual weights, a bias value being added, and the hard-limit transfer function being applied to the result.

Each external input is weighted with an appropriate weight w_1j, and the sum of the weighted inputs is sent to the hard-limit transfer function, which also has an input of 1 transmitted to it through the bias. The hard-limit transfer function, which returns a 0 or a 1, is shown below.

Plot of the hard-limit transfer function. For inputs greater than or equal to 0, the function returns 1. For inputs less than 0, the function returns 0.

The perceptron neuron produces a 1 if the net input into the transfer function is equal to or greater than 0; otherwise it produces a 0.

The hard-limit transfer function gives a perceptron the ability to classify input vectors by dividing the input space into two regions. Specifically, outputs will be 0 if the net input n is less than 0, or 1 if the net input n is 0 or greater. The following figure show the input space of a two-input hard limit neuron with the weights w_1,1 = −1, w_1,2 = 1 and a bias b = 1.

Plot of the input space of a two-input hard-limit neuron showing a decision boundary.

Two classification regions are formed by the decision boundary line L at Wp + b = 0. This line is perpendicular to the weight matrix W and shifted according to the bias b. Input vectors above and to the left of the line L will result in a net input greater than 0 and, therefore, cause the hard-limit neuron to output a 1. Input vectors below and to the right of the line L cause the neuron to output 0. You can pick weight and bias values to orient and move the dividing line so as to classify the input space as desired.

Hard-limit neurons without a bias will always have a classification line going through the origin. Adding a bias allows the neuron to solve problems where the two sets of input vectors are not located on different sides of the origin. The bias allows the decision boundary to be shifted away from the origin, as shown in the plot above.

Perceptron Architecture

The perceptron network consists of a single layer of S perceptron neurons connected to R inputs through a set of weights w_i,j, as shown below in two forms. As before, the network indices i and j indicate that w_i,j is the strength of the connection from the jth input to the ith neuron.

Network diagram of a perceptron network.

The perceptron learning rule described shortly is capable of training only a single layer. Thus only one-layer networks are considered here. This restriction places limitations on the computation a perceptron can perform. The types of problems that perceptrons are capable of solving are discussed in Limitations and Cautions.

Create a Perceptron

You can create a perceptron with the following:

net = perceptron;
net = configure(net,P,T);

where input arguments are as follows:

P is an R-by-Q matrix of Q input vectors of R elements each.
T is an S-by-Q matrix of Q target vectors of S elements each.

Commonly, the hardlim function is used in perceptrons, so it is the default.

The following commands create a perceptron network with a single one-element input vector with the values 0 and 2, and one neuron with outputs that can be either 0 or 1:

P = [0 2];
T = [0 1];
net = perceptron;
net = configure(net,P,T);

You can see what network has been created by executing the following command:

inputweights = net.inputweights{1,1}

which yields

inputweights = 
        delays: 0
       initFcn: 'initzero'
         learn: true
      learnFcn: 'learnp'
    learnParam: (none)
          size: [1 1]
     weightFcn: 'dotprod'
   weightParam: (none)
      userdata: (your custom info)

The default learning function is learnp, which is discussed in Perceptron Learning Rule (learnp). The net input to the hardlim transfer function is dotprod, which generates the product of the input vector and weight matrix and adds the bias to compute the net input.

The default initialization function initzero is used to set the initial values of the weights to zero.

Similarly,

biases = net.biases{1}

gives

biases = 
       initFcn: 'initzero'
         learn: 1
      learnFcn: 'learnp'
    learnParam: []
          size: 1
      userdata: [1x1 struct]

You can see that the default initialization for the bias is also 0.

Perceptron Learning Rule (learnp)

Perceptrons are trained on examples of desired behavior. The desired behavior can be summarized by a set of input, output pairs

$p_{1} t_{1}, p_{2} t_{1}, \dots, p_{Q} t_{Q}$

where p is an input to the network and t is the corresponding correct (target) output. The objective is to reduce the error e, which is the difference t − a between the neuron response a and the target vector t. The perceptron learning rule learnp calculates desired changes to the perceptron's weights and biases, given an input vector p and the associated error e. The target vector t must contain values of either 0 or 1, because perceptrons (with hardlim transfer functions) can only output these values.

Each time learnp is executed, the perceptron has a better chance of producing the correct outputs. The perceptron rule is proven to converge on a solution in a finite number of iterations if a solution exists.

If a bias is not used, learnp works to find a solution by altering only the weight vector w to point toward input vectors to be classified as 1 and away from vectors to be classified as 0. This results in a decision boundary that is perpendicular to w and that properly classifies the input vectors.

There are three conditions that can occur for a single neuron once an input vector p is presented and the network's response a is calculated:

CASE 1. If an input vector is presented and the output of the neuron is correct (a = t and e = t – a = 0), then the weight vector w is not altered.

CASE 2. If the neuron output is 0 and should have been 1 (a = 0 and t = 1, and e = t – a = 1), the input vector p is added to the weight vector w. This makes the weight vector point closer to the input vector, increasing the chance that the input vector will be classified as a 1 in the future.

CASE 3. If the neuron output is 1 and should have been 0 (a = 1 and t = 0, and e = t – a = –1), the input vector p is subtracted from the weight vector w. This makes the weight vector point farther away from the input vector, increasing the chance that the input vector will be classified as a 0 in the future.

The perceptron learning rule can be written more succinctly in terms of the error e = t – a and the change to be made to the weight vector Δw:

CASE 1. If e = 0, then make a change Δw equal to 0.

CASE 2. If e = 1, then make a change Δw equal to p^T.

CASE 3. If e = –1, then make a change Δw equal to –p^T.

All three cases can then be written with a single expression:

$Δ w = (t - α) p^{T} = e p^{T}$

You can get the expression for changes in a neuron's bias by noting that the bias is simply a weight that always has an input of 1:

$Δ b = (t - α) (1) = e$

For the case of a layer of neurons you have

$Δ W = (t - a) {(p)}^{T} = e {(p)}^{T}$

and

$Δ b = (t - a) = e$

The perceptron learning rule can be summarized as follows:

$W^{n e w} = W^{o l d} + e p^{T}$

and

$b^{n e w} = b^{o l d} + e$

where e = t – a.

Now try a simple example. Start with a single neuron having an input vector with just two elements.

net = perceptron;
net = configure(net,[0;0],0);

To simplify matters, set the bias equal to 0 and the weights to 1 and -0.8:

net.b{1} =  [0];
w = [1 -0.8];
net.IW{1,1} = w;

The input target pair is given by

p = [1; 2];
t = [1];

You can compute the output and error with

a = net(p)
a =
     0
e = t-a
e =
     1

and use the function learnp to find the change in the weights.

dw = learnp(w,p,[],[],[],[],e,[],[],[],[],[])
dw =
     1     2

The new weights, then, are obtained as

w = w + dw
w =
    2.0000    1.2000

The process of finding new weights (and biases) can be repeated until there are no errors. Recall that the perceptron learning rule is guaranteed to converge in a finite number of steps for all problems that can be solved by a perceptron. These include all classification problems that are linearly separable. The objects to be classified in such cases can be separated by a single line.

Training (train)

If sim and learnp are used repeatedly to present inputs to a perceptron, and to change the perceptron weights and biases according to the error, the perceptron will eventually find weight and bias values that solve the problem, given that the perceptron can solve it. Each traversal through all the training input and target vectors is called a pass.

The function train carries out such a loop of calculation. In each pass the function train proceeds through the specified sequence of inputs, calculating the output, error, and network adjustment for each input vector in the sequence as the inputs are presented.

Note that train does not guarantee that the resulting network does its job. You must check the new values of W and b by computing the network output for each input vector to see if all targets are reached. If a network does not perform successfully you can train it further by calling train again with the new weights and biases for more training passes, or you can analyze the problem to see if it is a suitable problem for the perceptron. Problems that cannot be solved by the perceptron network are discussed in Limitations and Cautions.

To illustrate the training procedure, work through a simple problem. Consider a one-neuron perceptron with a single vector input having two elements:

Diagram of a one-neuron perceptron network.

This network, and the problem you are about to consider, are simple enough that you can follow through what is done with hand calculations if you want. The problem discussed below follows that found in [HDB1996].

Suppose you have the following classification problem and would like to solve it with a single vector input, two-element perceptron network.

${p_{1} = [\begin{array}{l} 2 \\ 2 \end{array}], t_{1} = 0} {p_{2} = [\begin{matrix} 1 \\ - 2 \end{matrix}], t_{2} = 1} {p_{3} = [\begin{matrix} - 2 \\ 2 \end{matrix}], t_{3} = 0} {p_{4} = [\begin{matrix} - 1 \\ 1 \end{matrix}], t_{4} = 1}$

Use the initial weights and bias. Denote the variables at each step of this calculation by using a number in parentheses after the variable. Thus, above, the initial values are W(0) and b(0).

$\begin{matrix} W (0) = [\begin{matrix} 0 & 0 \end{matrix}] & b (0) = 0 \end{matrix}$

Start by calculating the perceptron’s output a for the first input vector p₁, using the initial weights and bias.

$\begin{matrix} α = h a r d l i m (W (0) p_{1} + b (0)) \\ = h a r d l i m ([\begin{matrix} 0 & 0 \end{matrix}] [\begin{array}{l} 2 \\ 2 \end{array}] + 0) = h a r d l i m (0) = 1 \end{matrix}$

The output a does not equal the target value t₁, so use the perceptron rule to find the incremental changes to the weights and biases based on the error.

$\begin{array}{l} e = t_{1} - α = 0 - 1 = - 1 \\ Δ W = e p_{1}^{T} = (- 1) [\begin{matrix} 2 & 2 \end{matrix}] = [\begin{matrix} - 2 & - 2 \end{matrix}] \\ Δ b = e = (- 1) = - 1 \end{array}$

You can calculate the new weights and bias using the perceptron update rules.

$\begin{array}{l} W^{n e w} = W^{o l d} + e p^{T} = [\begin{matrix} 0 & 0 \end{matrix}] + [\begin{matrix} - 2 & - 2 \end{matrix}] = [\begin{matrix} - 2 & - 2 \end{matrix}] = W (1) \\ b^{n e w} = b^{o l d} + e = 0 + (- 1) = - 1 = b (1) \end{array}$

Now present the next input vector, p₂. The output is calculated below.

$\begin{matrix} α = h a r d l i m (W (1) p_{2} + b (1)) \\ = h a r d l i m ([\begin{matrix} - 2 & - 2 \end{matrix}] [\begin{array}{r} 1 \\ - 2 \end{array}] - 1) = h a r d l i m (1) = 1 \end{matrix}$

On this occasion, the target is 1, so the error is zero. Thus there are no changes in weights or bias, so W(2) = W(1) = [−2 −2] and b(2) = b(1) = −1.

You can continue in this fashion, presenting p₃ next, calculating an output and the error, and making changes in the weights and bias, etc. After making one pass through all of the four inputs, you get the values W(4) = [−3 −1] and b(4) = 0. To determine whether a satisfactory solution is obtained, make one pass through all input vectors to see if they all produce the desired target values. This is not true for the fourth input, but the algorithm does converge on the sixth presentation of an input. The final values are

W(6) = [−2 −3] and b(6) = 1.

This concludes the hand calculation. Now, how can you do this using the train function?

The following code defines a perceptron.

net = perceptron;

Consider the application of a single input

p = [2; 2];

having the target

t = [0];

Set epochs to 1, so that train goes through the input vectors (only one here) just one time.

net.trainParam.epochs = 1;
net = train(net,p,t);

The new weights and bias are

w = net.iw{1,1}, b = net.b{1}
w =
    -2    -2
b =
    -1

Thus, the initial weights and bias are 0, and after training on only the first vector, they have the values [−2 −2] and −1, just as you hand calculated.

Now apply the second input vector p₂. The output is 1, as it will be until the weights and bias are changed, but now the target is 1, the error will be 0, and the change will be zero. You could proceed in this way, starting from the previous result and applying a new input vector time after time. But you can do this job automatically with train.

Apply train for one epoch, a single pass through the sequence of all four input vectors. Start with the network definition.

net = perceptron;
net.trainParam.epochs = 1;

The input vectors and targets are

p = [[2;2] [1;-2] [-2;2] [-1;1]]
t = [0 1 0 1]

Now train the network with

net = train(net,p,t);

The new weights and bias are

w = net.iw{1,1}, b = net.b{1}
w =
    -3    -1
b =
     0

This is the same result as you got previously by hand.

Finally, simulate the trained network for each of the inputs.

a = net(p)
a = 
      0     0     1     1

The outputs do not yet equal the targets, so you need to train the network for more than one pass. Try more epochs. This run gives a mean absolute error performance of 0 after two epochs:

net.trainParam.epochs = 1000;
net = train(net,p,t);

Thus, the network was trained by the time the inputs were presented on the third epoch. (As you know from hand calculation, the network converges on the presentation of the sixth input vector. This occurs in the middle of the second epoch, but it takes the third epoch to detect the network convergence.) The final weights and bias are

w = net.iw{1,1}, b = net.b{1}
w =
    -2    -3
b =
     1

The simulated output and errors for the various inputs are

a = net(p)
a =
             0             1             0             1
error = a-t
error =
             0             0             0             0

You confirm that the training procedure is successful. The network converges and produces the correct target outputs for the four input vectors.

The default training function for networks created with perceptron is trainc. (You can find this by executing net.trainFcn.) This training function applies the perceptron learning rule in its pure form, in that individual input vectors are applied individually, in sequence, and corrections to the weights and bias are made after each presentation of an input vector. Thus, perceptron training with train will converge in a finite number of steps unless the problem presented cannot be solved with a simple perceptron.

The function train can be used in various ways by other networks as well. Type help train to read more about this basic function.

You might want to try various example programs. For instance, Classification with a Two-Input Perceptron illustrates classification and training of a simple perceptron.

Limitations and Cautions

Perceptron networks should be trained with adapt, which presents the input vectors to the network one at a time and makes corrections to the network based on the results of each presentation. Use of adapt in this way guarantees that any linearly separable problem is solved in a finite number of training presentations.

As noted in the previous pages, perceptrons can also be trained with the function train. Commonly when train is used for perceptrons, it presents the inputs to the network in batches, and makes corrections to the network based on the sum of all the individual corrections. Unfortunately, there is no proof that such a training algorithm converges for perceptrons. On that account the use of train for perceptrons is not recommended.

Perceptron networks have several limitations. First, the output values of a perceptron can take on only one of two values (0 or 1) because of the hard-limit transfer function. Second, perceptrons can only classify linearly separable sets of vectors. If a straight line or a plane can be drawn to separate the input vectors into their correct categories, the input vectors are linearly separable. If the vectors are not linearly separable, learning will never reach a point where all vectors are classified properly. However, it has been proven that if the vectors are linearly separable, perceptrons trained adaptively will always find a solution in finite time. You might want to try Linearly Non-separable Vectors. It shows the difficulty of trying to classify input vectors that are not linearly separable.

It is only fair, however, to point out that networks with more than one perceptron can be used to solve more difficult problems. For instance, suppose that you have a set of four vectors that you would like to classify into distinct groups, and that two lines can be drawn to separate them. A two-neuron network can be found such that its two decision boundaries classify the inputs into four categories. For additional discussion about perceptrons and to examine more complex perceptron problems, see [HDB1996].

Outliers and the Normalized Perceptron Rule

Long training times can be caused by the presence of an outlier input vector whose length is much larger or smaller than the other input vectors. Applying the perceptron learning rule involves adding and subtracting input vectors from the current weights and biases in response to error. Thus, an input vector with large elements can lead to changes in the weights and biases that take a long time for a much smaller input vector to overcome. You might want to try Outlier Input Vectors to see how an outlier affects the training.

By changing the perceptron learning rule slightly, you can make training times insensitive to extremely large or small outlier input vectors.

Here is the original rule for updating weights:

$Δ w = (t - α) p^{T} = e p^{T}$

As shown above, the larger an input vector p, the larger its effect on the weight vector w. Thus, if an input vector is much larger than other input vectors, the smaller input vectors must be presented many times to have an effect.

The solution is to normalize the rule so that the effect of each input vector on the weights is of the same magnitude:

$Δ w = (t - α) \frac{p^{T}}{‖ p ‖} = e \frac{p^{T}}{‖ p ‖}$

The normalized perceptron rule is implemented with the function learnpn, which is called exactly like learnp. The normalized perceptron rule function learnpn takes slightly more time to execute, but reduces the number of epochs considerably if there are outlier input vectors. You might try Normalized Perceptron Rule to see how this normalized training rule works.