Online Book Reader

Home Category

Choose a category
All
Classic-Fiction

Data Mining - Mehmed Kantardzic [126]

By Root 908 0

function at the node level. Using the notation δj(n) = ej(n)·, where δj(n) is the local gradient, the final equation for wji(n) corrections is

The local gradient δj(n) points to the required changes in synaptic weights. According to its definition, the local gradient δj(n) for output neuron j is equal to the product of the corresponding error signal ej(n) for that neuron and the derivative φ′(vj[n]) of the associated activation function.

Derivative φ′(vj[n]) can be easily computed for a standard activation function, where differentiation is the only requirement for the function. If the activation function is sigmoid, it means that in the form

the first derivative is

and a final weight correction is

The final correction Δwji(n) is proportional to the learning rate η, the error value at this node is ej(n), and the corresponding input and output values are xi(n) and yj(n). Therefore, the process of computation for a given sample n is relatively simple and straightforward.

If the activation function is a hyperbolic tangent, a similar computation will give the final value for the first derivative φ′(vj[n]):

and

Again, the practical computation of Δwji(n) is very simple because the local-gradient derivatives depend only on the output value of the node yj(n).

In general, we may identify two different cases of computation for Δwji(n), depending on where in the network neuron j is located. In the first case, neuron j is an output node. This case is simple to handle because each output node of the network is supplied with a desired response, making it a straightforward matter to calculate the associated error signal. All previously developed relations are valid for output nodes without any modifications.

In the second case, neuron j is a hidden node. Even though hidden neurons are not directly accessible, they share responsibility for any error made at the output of the network. We may redefine the local gradient δj(n) for a hidden neuron j as the product of the associated derivative φ′(vj[n]) and the weighted sum of the local gradients computed for the neurons in the next layer (hidden or output) that are connected to neuron j

where D denotes the set of all nodes on the next layer that are connected to the node j. Going backward, all δk(n) for the nodes in the next layer are known before computation of the local gradient δj(n) for a given node on a layer closer to the inputs.

Let us analyze once more the application of the backpropagation-learning algorithm with two distinct passes of computation that are distinguished for each training example. In the first pass, which is referred to as the forward pass, the function signals of the network are computed on a neuron-by-neuron basis, starting with the nodes on first hidden layer (the input layer is without computational nodes), then the second, and so on, until the computation is finished with final output layer of nodes. In this pass, based on given input values of each learning sample, a network computes the corresponding output. Synaptic weights remain unaltered during this pass.

The second, backward pass, on the other hand, starts at the output layer, passing the error signal (the difference between the computed and the desired output value) leftward through the network, layer by layer, and recursively computing the local gradients δ for each neuron. This recursive process permits the synaptic weights of the network to undergo changes in accordance with the delta rule. For the neuron located at the output layer, δ is equal to the error signal of that neuron multiplied by the first derivative of its nonlinearity represented in the activation function. Based on local gradients δ, it is straightforward to compute Δw for each connection to the output nodes. Given the δ values for all neurons in the output layer, we use them in the previous layer before (usually the hidden layer) to compute modified local gradients for the nodes that are not the final, and again to correct Δw for input connections for this layer. The backward procedure is repeated until all

Online Book Reader

Data Mining - Mehmed Kantardzic [126]

®Online Book Reader