Difference between revisions of "Backpropagation"

From emergent
Jump to: navigation, search
(Feedforward Bp Reference)
Line 1: Line 1:
== Introduction ==
== Introduction ==
Line 40: Line 39:
* {{gendoc|class=BpUnit}}
* {{gendoc|class=BpUnit}}
* {{gendoc|class=BpUnitSpec}}
* {{gendoc|class=BpUnitSpec}}
* {{gendoc|class=BpLayer}
* {{gendoc|class=BpLayer}}
* {{gendoc|class=BpNetwork}}
* {{gendoc|class=BpNetwork}}

Revision as of 20:41, 16 August 2007


Backpropagation is perhaps the most commonly used neural network learning algorithm. Several different "flavors" of backpropagation have been developed over the years, several of which have been implemented in the software, including the use of different error functions such as cross-entropy, and recurrent backprop, from the simple recurrent network to the Almeida-Pineda algorithm up to the real-time continuous recurrent backprop. The implementation allows the user to extend the unit types to use different activation and error functions in a straightforward manner.

Note that the simple recurrent networks (SRN, a.k.a. Elman networks) are described in the feedforward backprop section, as they are more like feedforward networks than the fully recurrent ones.

The basic structure of the backpropagation algorithm consists of two phases, an activation propagation phase, and an error backpropagation phase. In the simplest version of Bp, both of these phases are strictly feed-forward and feed-back, and are computed sequentially layer-by-layer. Thus, the implementation assumes that the layers are organized sequentially in the order that activation flows.

In the recurrent versions, both the activation and the error propagation are computed in two steps so that each unit is effectively being updated simultaneously with the other units. This is done in the activation phase by first computing the net input to each unit based on the other units current activation values, and then updating the activation values based on this net input. Similarly, in the error phase, first the derivative of the error with respect to the activation (dEdA) of each unit is computed based on current dEdNet values, and then the dEdNet values are updated based on the new dEdNet.

Feedforward Bp Reference

The classes defined in the basic feedforward Bp implementation include:

Bias weights are implemented by adding a BpCon object to the BpUnit directly, and not by trying to allocate some kind of self projection or some other scheme like that. In addition, the BpUnitSpec has a pointer to a BpConSpec to control the updating etc of the bias weight. Thus, while some code was written to support the special bias weights on units, it amounts to simply calling the appropriate function on the BpConSpec.

Variations on the Standard

function with the threshold set by the parameter @code{threshold}. Activation is zero when net is below threshold, net-threshold above that.

is specified by the noise member.

probability of being active a sigmoidal function of the net input (e.g., like a Boltzmann Machine unit).

distance between the weights and the activations. The variance of the Gaussian is spherical (the same for all weights), and is given by the parameter var.

standard dot-product net input (not the distance, as in the RBF). The mean of the effectively uni-dimensional Gaussian is specified by the mean parameter, with a standard deviation of std_dev.

net input (e^net). This is useful for implementing SoftMax units, among other things.

exponential unit, and another input from a LinearBpUnitSpec unit that computes the sum over all the exponential units, and computes the division between these two. This results in a SoftMax unit. Note that the LinearBpUnitSpec must have fixed weights all of value 1, and that the SoftMaxUnit's must have the one-to-one projection from exp units first, followed by the projection from the sum units. See demo/bp_misc/bp_softmax.proj for a demonstration of how to configure a SoftMax network.

backpropagation. It is useful for making comparisons between delta-rule and Hebbian leanring. The rule is simply dwt = ru->act * su->act, where ru->act is the target value if present.

the factor @code{err_scale}. This can be used in cases where there are multiple output layers, some of which are not supposed to influence learning in the hidden layer, for example.

adaptation scheme (Jacobs, 1988). It should only be used in batch mode weight updating. The connection type must be

Reference info for type {{{1}}}: Wiki | Emergent Help Browser
, which contains a connection-wise learning rate

parameter. This learning rate is additively incremented by lrate_incr when the sign of the current and previous weight changes are in agreement, and decrements it multiplicatively by lrate_decr when they are not. The demo project demo/bp_misc/bp_ft_dbd.proj provides an example of how to set up delta-bar-delta learning.