Difference between revisions of "Backpropagation"

From emergent
Jump to: navigation, search
(New page: {{gendoc}})
 
Line 1: Line 1:
{{gendoc}}
+
 
 +
== Introduction ==
 +
 
 +
Backpropagation is perhaps the most commonly used neural network
 +
learning algorithm.  Several different "flavors" of backpropagation have
 +
been developed over the years, several of which have been implemented in
 +
the software, including the use of different error functions such
 +
as cross-entropy, and recurrent backprop, from the simple recurrent
 +
network to the Almeida-Pineda algorithm up to the real-time continuous
 +
recurrent backprop.  The implementation allows the user to extend the
 +
unit types to use different activation and error functions in a
 +
straightforward manner.
 +
 
 +
Note that the simple recurrent networks (SRN, a.k.a. Elman networks) are
 +
described in the feedforward backprop section, as they are more like
 +
feedforward networks than the fully recurrent ones.
 +
 
 +
The basic structure of the backpropagation algorithm consists of two phases, an
 +
activation propagation phase, and an error backpropagation phase.  In
 +
the simplest version of Bp, both of these phases are strictly
 +
feed-forward and feed-back, and are computed sequentially
 +
layer-by-layer.  Thus, the implementation assumes that the layers are
 +
organized sequentially in the order that activation flows.
 +
 
 +
In the recurrent versions, both the activation and the error propagation
 +
are computed in two steps so that each unit is effectively being updated
 +
simultaneously with the other units.  This is done in the activation
 +
phase by first computing the net input to each unit based on the other
 +
units current activation values, and then updating the activation values
 +
based on this net input.  Similarly, in the error phase, first the
 +
derivative of the error with respect to the activation (dEdA) of each
 +
unit is computed based on current dEdNet values, and then the dEdNet
 +
values are updated based on the new dEdNet.
 +
 
 +
== Feedforward Bp Reference ==
 +
 
 +
The classes defined in the basic feedforward Bp implementation include:
 +
* {{gendoc|class=BpConSpec}}
 +
* {{gendoc|class=BpCon}}
 +
* {{gendoc|class=BpUnit}}
 +
* {{gendoc|class=BpUnitSpec}}
 +
* {{gendoc|class=BpLayer}
 +
* {{gendoc|class=BpNetwork}}
 +
 
 +
Bias weights are implemented by adding a BpCon object
 +
to the BpUnit directly, and not by trying to allocate some kind of
 +
self projection or some other scheme like that.  In addition, the
 +
BpUnitSpec has a pointer to a BpConSpec to control the updating
 +
etc of the bias weight.  Thus, while some code was written to support
 +
the special bias weights on units, it amounts to simply calling the
 +
appropriate function on the BpConSpec.
 +
 
 +
=== Variations on the Standard ===
 +
 
 +
* {{gendoc|class=LinearBpUnitSpec}} implements a linear activation function
 +
* {{gendoc|class=ThreshLinBpUnitSpec}} implements a threshold linear activation
 +
function with the threshold set by the parameter @code{threshold}.
 +
Activation is zero when net is below threshold, net-threshold above
 +
that.
 +
* {{gendoc|class=NoisyBpUnitSpec}} adds noise to the activations of units.  The noise
 +
is specified by the noise member.
 +
* {{gendoc|class=StochasticBpUnitSpec}} computes a binary activation, with the
 +
probability of being active a sigmoidal function of the net input (e.g.,
 +
like a Boltzmann Machine unit).
 +
* {{gendoc|class=RBFBpUnitSpec}} computes activation as a Gaussian function of the
 +
distance between the weights and the activations.  The variance of the
 +
Gaussian is spherical (the same for all weights), and is given by the
 +
parameter var.
 +
* {{gendoc|class=BumpBpUnitSpec}} computes activation as a Gaussian function of the
 +
standard dot-product net input (not the distance, as in the RBF).  The
 +
mean of the effectively uni-dimensional Gaussian is specified by the
 +
mean parameter, with a standard deviation of std_dev.
 +
* {{gendoc|class=ExpBpUnitSpec}} computes activation as an exponential function of the
 +
net input (e^net).  This is useful for implementing SoftMax units, among
 +
other things.
 +
* {{gendoc|class=SoftMaxBpUnitSpec}} takes one-to-one input from a corresponding
 +
exponential unit, and another input from a LinearBpUnitSpec unit that
 +
computes the sum over all the exponential units, and computes the
 +
division between these two.  This results in a SoftMax unit.  Note that
 +
the LinearBpUnitSpec must have fixed weights all of value 1, and that
 +
the SoftMaxUnit's must have the one-to-one projection from exp units
 +
first, followed by the projection from the sum units.  See
 +
<code>demo/bp_misc/bp_softmax.proj</code> for a demonstration of how to
 +
configure a SoftMax network.
 +
* {{gendoc|class=HebbBpConSpec}} computes very simple Hebbian learning instead of
 +
backpropagation.  It is useful for making comparisons between delta-rule
 +
and Hebbian leanring.  The rule is simply <code>dwt = ru->act *
 +
su->act</code>, where <code>ru->act</code> is the target value if present.
 +
* {{gendoc|class=ErrScaleBpConSpec}} scales the error sent back to the sending units by
 +
the factor @code{err_scale}.  This can be used in cases where there are
 +
multiple output layers, some of which are not supposed to influence
 +
learning in the hidden layer, for example.
 +
* {{gendoc|class=DeltaBarDeltaBpConSpec}} implements the delta-bar-delta learning rate
 +
adaptation scheme (Jacobs, 1988).  It should only be used in batch
 +
mode weight updating.  The connection type must be
 +
{{gendoc|class=DeltaBarDeltaBpCon}}, which contains a connection-wise learning rate
 +
parameter.  This learning rate is additively incremented by
 +
lrate_incr when the sign of the current and previous weight
 +
changes are in agreement, and decrements it multiplicatively by
 +
lrate_decr when they are not.  The demo project
 +
<code>demo/bp_misc/bp_ft_dbd.proj</code> provides an example of how to set
 +
up delta-bar-delta learning.

Revision as of 19:41, 16 August 2007

Introduction

Backpropagation is perhaps the most commonly used neural network learning algorithm. Several different "flavors" of backpropagation have been developed over the years, several of which have been implemented in the software, including the use of different error functions such as cross-entropy, and recurrent backprop, from the simple recurrent network to the Almeida-Pineda algorithm up to the real-time continuous recurrent backprop. The implementation allows the user to extend the unit types to use different activation and error functions in a straightforward manner.

Note that the simple recurrent networks (SRN, a.k.a. Elman networks) are described in the feedforward backprop section, as they are more like feedforward networks than the fully recurrent ones.

The basic structure of the backpropagation algorithm consists of two phases, an activation propagation phase, and an error backpropagation phase. In the simplest version of Bp, both of these phases are strictly feed-forward and feed-back, and are computed sequentially layer-by-layer. Thus, the implementation assumes that the layers are organized sequentially in the order that activation flows.

In the recurrent versions, both the activation and the error propagation are computed in two steps so that each unit is effectively being updated simultaneously with the other units. This is done in the activation phase by first computing the net input to each unit based on the other units current activation values, and then updating the activation values based on this net input. Similarly, in the error phase, first the derivative of the error with respect to the activation (dEdA) of each unit is computed based on current dEdNet values, and then the dEdNet values are updated based on the new dEdNet.

Feedforward Bp Reference

The classes defined in the basic feedforward Bp implementation include:

Bias weights are implemented by adding a BpCon object to the BpUnit directly, and not by trying to allocate some kind of self projection or some other scheme like that. In addition, the BpUnitSpec has a pointer to a BpConSpec to control the updating etc of the bias weight. Thus, while some code was written to support the special bias weights on units, it amounts to simply calling the appropriate function on the BpConSpec.

Variations on the Standard

function with the threshold set by the parameter @code{threshold}. Activation is zero when net is below threshold, net-threshold above that.

is specified by the noise member.

probability of being active a sigmoidal function of the net input (e.g., like a Boltzmann Machine unit).

distance between the weights and the activations. The variance of the Gaussian is spherical (the same for all weights), and is given by the parameter var.

standard dot-product net input (not the distance, as in the RBF). The mean of the effectively uni-dimensional Gaussian is specified by the mean parameter, with a standard deviation of std_dev.

net input (e^net). This is useful for implementing SoftMax units, among other things.

exponential unit, and another input from a LinearBpUnitSpec unit that computes the sum over all the exponential units, and computes the division between these two. This results in a SoftMax unit. Note that the LinearBpUnitSpec must have fixed weights all of value 1, and that the SoftMaxUnit's must have the one-to-one projection from exp units first, followed by the projection from the sum units. See demo/bp_misc/bp_softmax.proj for a demonstration of how to configure a SoftMax network.

backpropagation. It is useful for making comparisons between delta-rule and Hebbian leanring. The rule is simply dwt = ru->act * su->act, where ru->act is the target value if present.

the factor @code{err_scale}. This can be used in cases where there are multiple output layers, some of which are not supposed to influence learning in the hidden layer, for example.

adaptation scheme (Jacobs, 1988). It should only be used in batch mode weight updating. The connection type must be

Reference info for type {{{1}}}: Wiki | Emergent Help Browser
, which contains a connection-wise learning rate

parameter. This learning rate is additively incremented by lrate_incr when the sign of the current and previous weight changes are in agreement, and decrements it multiplicatively by lrate_decr when they are not. The demo project demo/bp_misc/bp_ft_dbd.proj provides an example of how to set up delta-bar-delta learning.