Leabra
Contents
Introduction
Leabra stands for Local, Errordriven and Associative, Biologically Realistic Algorithm, and it implements a balance between associative (Hebbian) and errordriven learning on top of a biologicallybased pointneuron activation function with inhibitory competition dynamics (either via inhibitory interneurons or an approximation thereof), which produce kWinnersTakeAll (kWTA) sparse distributed representations. Extensive documentation is available from the online textbook: http://ccnbook.colorado.edu which serves as a second edition to the original book: Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain, O'Reilly and Munakata, 2000, Cambridge, MA: MIT Press. (Computational Explorations..)
As of November 3, 2014: Please see Leabra 8.0 Update for important information about updating to the 8.0 version of emergent (still soon to be released as of 1/2016). See emergent7 for continuing support for version 7.0. The documentation below describes the 8.0 version.
Overview
The basic activation dynamics of Leabra are based on standard electrophysiological principles of real neurons, and in discrete spiking mode we implement exactly the AdEx (adapting exponential) model of Gerstner and colleagues (Scholarpedia article on AdEx). The rate code mode (which runs faster and allows for smaller networks) provides a very close approximation to the AdEx model behavior, in terms of a graded activation signal matching the actual instantaneous rate of spiking across a population of AdEx neurons. We generally conceive of a single ratecode neuron as representing a microcolumn of roughly 100 spiking pyramidal neurons in the neocortex. There are also shortterm plasticity dynamics that produce synaptic depression and potentiation, as described in Hennig (2013).
The excitatory synaptic input conductances (i.e., net input) is computed as an average, not a sum, over connections, based on normalized, sigmoidaly transformed weight values, which are subject to scaling on a connectiongroup level to alter relative contributions. Automatic scaling is performed to compensate for differences in expected activity level in the different projections. See Leabra Netin Scaling for details.
Inhibition is computed using a feedforward (FF) and feedback (FB) inhibition function (FFFB) that closely approximates the behavior of inhibitory interneurons in the neocortex. FF is based on a multiplicative factor applied to the average net input coming into a layer, and FB is based on a multiplicative factor applied to the average activation within the layer. These simple linear functions do an excellent job of controlling the overall activation levels in bidirectionally connected networks, producing behavior very similar to the more abstract computational implementation of kWTA dynamics implemented in previous versions.
There is a single learning equation, derived from a very detailed model of spike timing dependent plasticity (STDP) by Urakubo et al, 2008, that produces a combination of Hebbian associative and errordriven learning. For historical reasons, we call this the XCAL equation (eXtended Contrastive Attractor Learning), and it is functionally very similar to the BCM learning rule developed by Bienenstock, Cooper & Munro (1982). The essential learning dynamic involves a Hebbian coproduct of sending neuron activation times receiving neuron activation, which biologically reflects the amount of calcium entering through NMDA channels, and this coproduct is then compared against a floating threshold value. To produce the Hebbian learning dynamic, this floating threshold is based on a longterm running average of the receiving neuron activation. This is the key idea for the BCM algorithm. To produce errordriven learning, the floating threshold is based on a much faster running average of activation coproducts, which reflects an expectation or prediction, against which the instantaneous, later outcome is compared.
Weights are subject to a contrast enhancement function, which compensates for the soft (exponential) weight bounding that keeps weights within the normalized 01 range. Contrast enhancement is important for enhancing the selectivity of selforganizing learning, and generally results in faster learning with better overall results. Learning operates on the underlying internal linear weight value. Biologically, we associate the underlying linear weight value with internal synaptic factors such as actin scaffolding, CaMKII phosphorlation level, etc, while the contrast enhancement operates at the level of AMPA receptor expression. We also support (optionally) a distinction between fast and slow weights  biologically fast weights reflect the early LTP dynamics, which rise and fall over roughly 15 minutes after induction, while slow weights reflect the stable longterm weight values that the fast weights decay back to.
There are various extensions to the algorithm that implement special neural mechanisms associated with the prefrontal cortex and basal ganglia (PBWM), dopamine systems PVLV, the Hippocampus, and temporal integration dynamics associated with the thalamocortical circuits (LeabraTI, and CIFER).
Leabra Algorithm Equations
The pseudocode for Leabra is given here, showing exactly how the pieces of the algorithm fit together, using the equations and variables from the actual code. The emergent implementation contains a number of optimizations (including vectorization and GPU code), but this provides the core math in simple form.
See the Matlab
directory in the emergent svn source directory for a complete implementation of these equations in Matlab, coded by Sergio VerduzcoFlores  this can be a lot simpler to read than the highly optimized C++ source code.
Variables
LeabraUnits are organized into LeabraLayers, which sometimes have unit groups (which are now typically purely virtual, not actual Unit_Group objects). The LeabraUnit has the following key parameters, along with a number of others that are used for other nondefault algorithms and various optimizations, etc.
 act = activation sent to other units
 act_nd = nondepressed activation  prior to application of any shortterm plasticity
 net_raw = raw netinput, prior to timeaveraging
 net = timeaveraged excitatory conductance (net input)
 gc_i = inhibitory conductance, computed from FFFB inhibition function typically
 I_net = net current, combining excitatory, inhibitory, and leak channels
 v_m = membrane potential
 v_m_eq = equilibrium membrane potential  not reset by spikes  just keeps integrating
 adapt = adaptation current
 avg_ss = supershort term running average activation
 avg_s = shortterm running average activation, integrates over avg_ss, represents plus phase learning signal
 avg_m = mediumterm running average activation, integrates over avg_s, represents minus phase learning signal
 avg_l = longterm running average activation, integrates over avg_m, drives longterm floating average for Hebbian learning
 avg_l_lrn = how much to use the avg_lbased Hebbian learning for this receiving unit's learning  in addition to the basic errordriven learning  this can optionally be dynamically updated based on the avg_l factor and average level of error in the receiving layer, so that this Hebbian learning constraint can be stronger as a unit gets too active and needs to be regulated more strongly, and in proportion to average error levels in the layer.
 avg_s_eff = effective avg_s value used in learning  includes a small fraction (.1) of the avg_m value, for reasons explained below.
Units are connected via synapses parameterized with the following variables. These are actually stored in an optimized vector format, but the LeabraCon object contains the variables as a template.
 wt = net effective synaptic weight between objects  subject to contrast enhancement compared to fwt and swt
 dwt = deltawt  change in synaptic weights due to learning
 fwt = fast weight  used for advanced fast and slow weight learning dynamic  otherwise equal to swt  stored as noncontrast enhanced value
 swt = slow weight  standard learning rate weight  stored as noncontrast enhanced value
LeabraCycle: Net input, Inhibition, Activation
For every cycle of activation updating, compute the net input, inhibition, membrane potential, and activation:
 Net input (see LeabraUnitSpec.cpp for code):

net_raw += (sum over recv connections of:) scale_eff * act * wt
 scale_eff = Leabra Netin Scaling factor that includes 1/N to compute an average, plus wt_scale.rel and abs relative and absolute scaling terms.
 act = sending unit activation
 wt = receiving connection weight value between sender and receiver
 emergent does this very efficiently by using a senderbased computation, that only sends changes (deltas) in activation values  typically only a few percent of neurons send on any given cycle.

net += dt.integ * dt.net_dt * (net_raw  net)
 time integration of net input, using net_dt (1/1.4 default), and global integration time constant, dt.integ (1 = 1 msec default)

 Inhibition (see LeabraLayerSpec.cpp for code):

ffi = ff * MAX(netin.avg  ff0, 0)
 feedforward component of inhibition with ff multiplier (1 by default)  has ff0 offset and can't be negative (that's what the MAX(.. ,0) part does).
 netin.avg is average of net variable across unit group or layer, depending on what level this is being computed at (both are supported)

fbi += fb_dt * (fb * acts.avg  fbi)
 feedback component of inhibition with fb multiplier (1 by default)  requires time integration to dampen oscillations that otherwise occur  fb_dt = 1/1.4 default

gc_i = gi * (ffi + fbi)
 total inhibitory conductance, with global gi multiplier  default of gi=1.8 typically produces good sparse distributed representations in reasonably large layers (25 units or more)

 Membrane potential (see LeabraUnitSpec.cpp for code)

I_net = net * (e_rev.e  v_m) + gc_l * (e_rev.l  v_m) + gc_i * (e_rev.i  v_m) + noise
 net current = sum of individual ionic channels: e = excitatory, l = leak (gc_l is a constant, 0.1 default), and i = inhibitory
 e_rev are reversal potentials: in normalized values derived from biophysical values, e_rev.e = 1, .l = 0.3, i = 0.25
 noise is typically gaussian if added
 if ex:
I_net += g_bar.l * exp_slope * exp((v_m  thr) / exp_slope)
 this is the exponential component of AdEx, if in use (typically only for discrete spiking), exp_slope = .02 default

v_m += dt.integ * dt.vm_dt * (I_net  adapt)
 in emergent, we use a simple midpoint method that evaluates v_m with a halfstep time constant, and then uses this halfstep v_m to compute full step in above I_net equation. vm_dt = 1/3.3 default.
 v_m is always computed as in discrete spiking, even when using rate code, with v_m reset to vm_r etc  this provides a more natural way to integrate adaptation and shortterm plasticity mechanisms, which drive off of the discrete spiking.

v_m_eq += dt.integ * dt.vm_dt * (I_net  adapt)
 the equilibrium version of the membrane potential does not reset with spikes, and is important for rate code per below

 Activation (see LeabraUnitSpec.cpp for code)

g_e_thr = (gc_i * (e_rev_i  thr) + gc_l * (e_rev_l  thr)  adapt) / (thr  e_rev.e)
 the amount of excitatory conductance required to put the neuron exactly at the firing threshold, thr = .5 default.

if(v_m > spk_thr) { spike = 1; v_m = vm_r } else { spike = 0 }
 spk_thr is spiking threshold (1.2 default, different from rate code thr), vm_r = .3 is the reset value of the membrane potential after spiking  we also have an optional refractory period after spiking, default = 3 cycles, where the vm equations are simply not computed, and vm remains at vm_r.
 if using spiking mode, then act = spike, otherwise, rate code function is below

if(v_m_eq <= thr) { new_act = NXX1(v_m_eq  thr) } else { new_act = NXX1(net  g_e_thr) }
 it is important that the time to first "spike" be governed by v_m integration dynamics, but after that point, it is essential that activation drive directly from the excitatory conductance (g_e or net) relative to the g_e_thr threshold  activation rates are linear in this term, but not even a welldefined function of v_m_eq  earlier versions of Leabra only used the v_m_eqbased term, and this led to some very strange behavior.
 NXX1 = noisyxoverx+1 function, which is implemented using a lookup table due to the convolving of the XX1 function with a gaussian noise kernel

XX1(x) = gain * x / (gain * x + 1)
 gain = 100 default

act_nd += dt.integ * dt.vm_dt * (new_act  act_nd)
 nondepressed rate code activation is timeintegrated using same vm_dt time constant as used in v_m, from the new activation value

act = act_nd * syn_tr (or just act_nd)
 if shortterm plasticity is in effect, then syn_tr variable reflects the synaptic transmission efficacy, and this product provides the net signal sent to the receiving neurons. otherwise syn_tr = 1.

adapt += dt.integ * (adapt.dt * (vm_gain * (v_m  e_rev.l)  adapt) + spike * spike_gain)
 adaptation current  causes rate of activation / spiking to decrease over time, adapt.dt = 1/144, vm_gain = 0.04, spike_gain = .00805 defaults

Learning
Learning is organized around the following timing, based on an internallygenerated alphafrequency (10 Hz, 100 msec periods) cycle of expectation followed by outcome, supported by neocortical circuitry in the deep layers and the thalamus, as hypothesized in the DeepLeabra extension to standard Leabra:
 A Trial lasts 100 msec (10 Hz, alpha frequency), and comprises one sequence of expectation  outcome learning, organized into 4 quarters.
 Biologically, the deep neocortical layers (layers 5, 6) and the thalamus have a natural oscillatory rhythm at the alpha frequency. Specific dynamics in these layers organize the cycle of expectation vs. outcome within the alpha cycle.
 A Quarter lasts 25 msec (40 Hz, gamma frequency)  the first 3 quarters (75 msec) form the expectation / minus phase, and the final quarter are the outcome / plus phase.
 Biologically, the superficial neocortical layers (layers 2, 3) have a gamma frequency oscillation, supporting the quarterlevel organization.
 A Cycle represents 1 msec of processing, where each neuron updates its membrane potential etc according to the above equations.
Learning is based on runningaverages of activation variables, described first:
 Running averages computed continuously every cycle, and note the compounding form (see LeabraUnitSpec.cpp for code)

avg_ss += dt.integ * ss_dt * (act_nd  avg_ss)
 supershort time scale running average, ss_dt = 1/2 default  this was introduced to smooth out discrete spiking signal, but is also useful for rate code

avg_s += dt.integ * act_avg.s_dt * (avg_ss  avg_s)
 short time scale running average, s_dt = 1/2 default  this represents the "plus phase" or actual outcome signal in comparison to avg_m

avg_m += dt.integ * act_avg.m_dt * (avg_s  avg_m)
 medium timescale running average, m_dt = 1/10 average  this represents the "minus phase" or expectation signal in comparison to avg_s

avg_l += (avg_m > 0.2) ? dt * (avg_l.max  avg_m) : dt * (avg_l.min  avg_m)
 longterm running average  this is computed just once per learning trial, not every cycle like the ones above  max = 1.5, min = .1, dt = .1 by default
 this is a basic exponential approach to max / min depending on whether unit is active or not  having a clear max / min is important for the learning modulation shown next  x ? y : z terminology is C syntax for if x is true, then y, else z

avg_s_eff = m_in_s * avg_m + (1  m_in_s) * avg_s
 mix in some of the mediumterm factor into the shortterm factor  this is important for ensuring that when neuron turns off in the plus phase (short term), that enough trace of earlier minusphase activation remains to drive it into the LTD weight decrease region  m_in_s = .1 default.
 this is now done at the unit level  previously was done at the connection level which is much less efficient!

 Optional: dynamic modulation of amount of Hebbian learning, based on avg_l value and level of err in a given layer  these factors make a small (few percent) but reliable difference in overall performance across various challenging tasks  they can readily be omitted in favor of a fixed avg_l_lrn factor of around 0.0004 (with 0 for target layers  it doesn't make sense to have any Hebbian learning at output layers):

avg_l_lrn = avg_l.lrn_min + (avg_l  avg_l.min) * ((avg_l.lrn_max  avg_l.lrn_min) / avg_l.max  avg_l.min))
 learning strength factor for how much to learn based on avg_l floating threshold  this is dynamically modulated by strength of avg_l itself, and this turns out to be critical  the amount of this learning increases as units are more consistently active all the time (i.e., "hog" units). avg_l.lrn_min = 0.005, avg_l.lrn_max = 0.05. Note that this depends on having a clear max to avg_l, which is an advantage of the exponential bounding form above.

avg_l_lrn *= MAX(1  cos_diff_avg, 0.01)
 also modulate by timeaveraged cosine (normalized dot product) between minus and plus phase activation states in given receiving layer (cos_diff_avg), (time constant 100)  if error signals are small in a given layer, then Hebbian learning should also be relatively weak so that it doesn't overpower it  and conversely, layers with higher levels of error signals can handle (and benefit from) more Hebbian learning. The MAX(0.01) factor ensures that there is a minimum level of .01 Hebbian (multiplying the previouslycomputed factor above). The .01 * .05 factors give an upperlevel value of .0005 to use for a fixed constant avg_l_lrn value  just slightly less than this (.0004) seems to work best if not using these adaptive factors.

 Learning equation (see LeabraConSpec.h for code)  most of these are intermediate variables used in computing final dwt value

srs = ru>avg_s_eff * su>avg_s_eff
 shortterm senderreceiver coproduct  this is the intracellular calcium from NMDA and other channels

srm = ru>avg_m * su>avg_m
 mediumterm senderreceiver coproduct  this drives dynamic threshold for errordriven learning

dwt += lrate * [ m_lrn * XCAL(srs, srm) + ru>avg_l_lrn * XCAL(srs, ru>avg_l)]
 weight change is sum of two factors: errordriven based on mediumterm threshold (srm), and BCM Hebbian based on longterm threshold of the recv unit (ru>avg_l)
 in earlier versions, the two factors were combined into a single threshold value, using normalized weighting factors  this was more elegant, but by separating the two apart, we allow the hebbian component to use the full range of the XCAL function (as compared to the relatively small avg_l_lrn factor applied inside the threshold computation). By multiplying by avg_l_lrn outside the XCAL equation, we get the desired contrast enhancement property of the XCAL function, where values close to the threshold are pushed either higher (above threshold) or lower (below threshold) most strongly, and values further away are less strongly impacted.
 m_lrn is a constant and is typically 1.0 when errordriven learning is employed (but can be set to 0 to have a completely Hebbian model).
 XCAL is the "check mark" linearized BCMstyle learning function (see figure) that was derived from the Urakubo Et Al (2008) STDP model, as described in more detail in the CCN textbook: http://ccnbook.colorado.edu
XCAL(x, th) = (x < d_thr) ? 0 : (x > th * d_rev) ? (x  th) : (x * ((1d_rev)/d_rev))
 d_thr = 0.0001, d_rev = 0.1 defaults
 x ? y : z terminology is C syntax for: if x is true, then y, else z

 Weight update equation (see LeabraConSpec.h for code) (see below for alternative version using fast weights)
 The fwt value here is the linear, noncontrast enhanced version of the weight value, while wt is the sigmoidal contrastenhanced version, which is used for sending netinput to other neurons. One can compute fwt from wt and viceversa, but numerical errors can accumulate in going backand forth more than necessary, and it is generally faster to just store these two weight values (and they are needed for the slow vs. fast weights version show below).

dwt *= (dwt > 0) ? (1fwt) : fwt
 soft weight bounding  weight increases exponentially decelerate toward upper bound of 1, and decreases toward lower bound of 0. based on linear, noncontrast enhanced fwt weights.

fwt += dwt
 increment the linear weights with the bounded dwt term

wt = SIG(fwt)
 new weight value is sigmoidal contrast enhanced version of fast weight

SIG(x) = 1 / (1 + (off * (1w)/w)^gain)

dwt = 0
 reset weight changes now that they have been applied.
 Optional: Slow vs. Fast Weights version of the weight update equation  this is not widely used (as of yet), but enables rapid but more transient learning to coexist with slower more enduring learning at each synapse, which can have important behavioral implications:

dwt *= (dwt > 0) ? (1fwt) : fwt
 soft weight bounding  weight increases exponentially decelerate toward upper bound of 1, and decreases toward lower bound of 0. based on fast weights (in linear, noncontrast enhanced form), so that fast weights can drive learning saturation if in effect.

fwt += dwt
 fast weights learn according to standard learning mechanism  to make them faster, turn up the std learning rate  important to not change the basic learning rule here by doing something else with the fast weights

eff_wt = swt_pct * swt + (1  swt_pct) * fwt
 effective weight value is blend of slow weights and fast weights  swt_pct can be .5 or .8 typically (eff_wt is not stored  just a tmp variable)

nwt = SIG(eff_wt)
 new weight value is sigmoidal contrast enhanced version of effective weight (nwt is not stored  just a tmp variable)

SIG(x) = 1 / (1 + (off * (1w)/w)^gain)

wt += wt_dt * (nwt  wt)
 weight moves toward new weight value with a time constant  reflects slowed rise time seen in early LTP dynamics

swt += slow_dt * (fwt  swt)
 slow weight moves slowly toward fast weights slow_dt = 1/100 typically but can be as long as 1/3000

dwt = 0
 reset weight changes now that they have been applied.

 Optional: Weight Balance  this option attempts to maintain more balanced weights across units, to prevent some units from hogging the representational space, by changing the rates of weight increase and decrease in the soft weight bounding function, as a function of the average receiving weights:

dwt *= (dwt > 0) ? wb_inc * (1fwt) : wb_dec * fwt
 wb_inc = weight increase modulator, and wb_dec = weight decrease modulator (when these are both 1, this is same as standard, and this is the default value of these factors)

wt_avg = <wt>
 average of all the receiving weights  computed per projection (corresponding to a dendritic branch perhaps)

if (wt_avg > hi_thr) then wbi = gain * (wt_avg  hi_thr); wb_inc = 1  wbi; wb_dec = 1 + wbi
 If the average weights are higher than a high threshold (hi_thr = .4 default) then the increase factor wb_inc is reduced, and the decrease factor wb_dec is increased, by a factor wbi that is determined by how far above the threshold the average is. Thus, the higher the weights get, the less quickly they can increase, and the more quickly they decrease, pushing them back into balance.

if (wt_avg < lo_thr) then wbd = gain * (wt_avg  lo_thr); wb_inc = 1  wbd; wb_dec = 1 + wbd
 This is the symmetric version for case when weight averages are below a low threshold (lo_thr = .2), and the weight balance factors go in the opposite direction (wbd is negative), causing weight increases to be favored over decreases.
 The hi_thr and lo_thr parameters are specified in terms of a target weight average value
trg = .3
with a thresholdthr=.1
around that target value, with these defaults producing the default .4 and .2 hi and lo thresholds respectively.  A key feature of this mechanism is that it does not change the sign of any weight changes, including not causing weights to change that are otherwise not changing due to the learning rule. This is not true of an alternative mechanism that has been used in various models, which normalizes the total weight value by subtracting the average. Overall this weight balance mechanism is important for larger networks on harder tasks, where the hogging problem can be a significant problem.

Specific Leabra Object Information
The following provide more details for the main class objects used in Leabra.
Network structure
Specs
As discussed in Specs, the specs contain all the parameters and algorithmspecific code, while the above objects contain the dynamic state information.
Special Algorithms
See LeabraWizard for special functions for configuring several of these special architectural features.
Simple Recurrent Network Context Layer
 LeabraContextLayerSpec  copies activation state of hidden layer
Scalar, Graded Value Representation
 ScalarValLayerSpec  encodes and decodes scalar, realnumbered values based on a coarse coded distributed representation (e.g., a Gaussian bump) across multiple units. This provides a very efficient and effective way of representing scalar values  individual Leabra units do not do a very good job of that, as they have a strong binary bias.
 TwoDValLayerSpec  twodimensional version of scalar val
 MotorForceLayerSpec  represents motor output and input forces in a distributed manner across unit groups representing position and velocity.
Temporal Differences and General Da (dopamine) Modulation
Temporal differences (TD) is widely used as a model of midbrain dopaminergic firing. Also included are Leabra Units that respond to simulated dopaminergic modulation (DaMod)
See Leabra TD for details.
PVLV  Pavlovian Conditioning
Simulates behavioral and neural data on Pavlovian conditioning and the midbrain dopaminergic neurons that fire in proportion to unexpected rewards (an alternative to TD). It is described in these papers: O'Reilly, Frank, Hazy & Watz, 2007 Hazy, Frank & O'Reilly, 2010. The current version (described here) is as described in the 2010 paper. A PVLV model can be made through the LeabraWizard  under Networks menu.
See Leabra PVLV for full details.
PBWM  Prefrontal Cortex Basal Ganglia Working Memory
Uses PVLV to train PFC working memory updating system, based on the biology of the prefrontal cortex and basal ganglia. For complete details, see O'Reilly and Frank, 2006.
Described in Leabra PBWM.
Other Misc Classes
 MarkerConSpec  a "null" connection that doesn't do anything, but serves as a marker for special computations (e.g., the temporal derivative computation instead of standard net input).
 LeabraLinUnitSpec  linear activation function
 LeabraNegBiasSpec  negative bias learning (for refractory neurons)
 TrialSynDepConSpec, TrialSynDepCon  synaptic depression on a trialwise time scale
 FastWtConSpec, FastWtCon  transient fast weights in addition to standard slowly adapting weights
 ActAvgHebbConSpec  hebbian learning includes proportion of timeaveraged activation (as in the "trace" rule)
Parameter Setting Tips and Tricks
See Leabra Params for detailed discussion about adjusting parameters.
Software Version Update Issues
 Leabra 4.0.15 Notes  Version 4.0.15 release notes for Leabra