From emergent
Jump to: navigation, search


Leabra stands for Local, Error-driven and Associative, Biologically Realistic Algorithm, and it implements a balance between associative (Hebbian) and error-driven learning on top of a biologically-based point-neuron activation function with inhibitory competition dynamics (either via inhibitory interneurons or an approximation thereof), which produce k-Winners-Take-All (kWTA) sparse distributed representations. Extensive documentation is available from the online textbook: http://ccnbook.colorado.edu which serves as a second edition to the original book: Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain, O'Reilly and Munakata, 2000, Cambridge, MA: MIT Press. (Computational Explorations..)

As of November 3, 2014: Please see Leabra 8.0 Update for important information about updating to the 8.0 version of emergent (still soon to be released as of 1/2016). See emergent7 for continuing support for version 7.0. The documentation below describes the 8.0 version.


The basic activation dynamics of Leabra are based on standard electrophysiological principles of real neurons, and in discrete spiking mode we implement exactly the AdEx (adapting exponential) model of Gerstner and colleagues (Scholarpedia article on AdEx). The rate code mode (which runs faster and allows for smaller networks) provides a very close approximation to the AdEx model behavior, in terms of a graded activation signal matching the actual instantaneous rate of spiking across a population of AdEx neurons. We generally conceive of a single rate-code neuron as representing a microcolumn of roughly 100 spiking pyramidal neurons in the neocortex. There are also short-term plasticity dynamics that produce synaptic depression and potentiation, as described in Hennig (2013).

The excitatory synaptic input conductances (i.e., net input) is computed as an average, not a sum, over connections, based on normalized, sigmoidaly transformed weight values, which are subject to scaling on a connection-group level to alter relative contributions. Automatic scaling is performed to compensate for differences in expected activity level in the different projections. See Leabra Netin Scaling for details.

Inhibition is computed using a feed-forward (FF) and feed-back (FB) inhibition function (FFFB) that closely approximates the behavior of inhibitory interneurons in the neocortex. FF is based on a multiplicative factor applied to the average net input coming into a layer, and FB is based on a multiplicative factor applied to the average activation within the layer. These simple linear functions do an excellent job of controlling the overall activation levels in bidirectionally connected networks, producing behavior very similar to the more abstract computational implementation of kWTA dynamics implemented in previous versions.

There is a single learning equation, derived from a very detailed model of spike timing dependent plasticity (STDP) by Urakubo et al, 2008, that produces a combination of Hebbian associative and error-driven learning. For historical reasons, we call this the XCAL equation (eXtended Contrastive Attractor Learning), and it is functionally very similar to the BCM learning rule developed by Bienenstock, Cooper & Munro (1982). The essential learning dynamic involves a Hebbian co-product of sending neuron activation times receiving neuron activation, which biologically reflects the amount of calcium entering through NMDA channels, and this co-product is then compared against a floating threshold value. To produce the Hebbian learning dynamic, this floating threshold is based on a long-term running average of the receiving neuron activation. This is the key idea for the BCM algorithm. To produce error-driven learning, the floating threshold is based on a much faster running average of activation co-products, which reflects an expectation or prediction, against which the instantaneous, later outcome is compared.

Weights are subject to a contrast enhancement function, which compensates for the soft (exponential) weight bounding that keeps weights within the normalized 0-1 range. Contrast enhancement is important for enhancing the selectivity of self-organizing learning, and generally results in faster learning with better overall results. Learning operates on the underlying internal linear weight value. Biologically, we associate the underlying linear weight value with internal synaptic factors such as actin scaffolding, CaMKII phosphorlation level, etc, while the contrast enhancement operates at the level of AMPA receptor expression. We also support (optionally) a distinction between fast and slow weights -- biologically fast weights reflect the early LTP dynamics, which rise and fall over roughly 15 minutes after induction, while slow weights reflect the stable long-term weight values that the fast weights decay back to.

There are various extensions to the algorithm that implement special neural mechanisms associated with the prefrontal cortex and basal ganglia (PBWM), dopamine systems PVLV, the Hippocampus, and temporal integration dynamics associated with the thalamocortical circuits (LeabraTI, and CIFER).

Leabra Algorithm Equations

The pseudocode for Leabra is given here, showing exactly how the pieces of the algorithm fit together, using the equations and variables from the actual code. The emergent implementation contains a number of optimizations (including vectorization and GPU code), but this provides the core math in simple form.

See the Matlab directory in the emergent svn source directory for a complete implementation of these equations in Matlab, coded by Sergio Verduzco-Flores -- this can be a lot simpler to read than the highly optimized C++ source code.


LeabraUnits are organized into LeabraLayers, which sometimes have unit groups (which are now typically purely virtual, not actual Unit_Group objects). The LeabraUnit has the following key parameters, along with a number of others that are used for other non-default algorithms and various optimizations, etc.

  • act = activation sent to other units
  • act_nd = non-depressed activation -- prior to application of any short-term plasticity
  • net_raw = raw netinput, prior to time-averaging
  • net = time-averaged excitatory conductance (net input)
  • gc_i = inhibitory conductance, computed from FFFB inhibition function typically
  • I_net = net current, combining excitatory, inhibitory, and leak channels
  • v_m = membrane potential
  • v_m_eq = equilibrium membrane potential -- not reset by spikes -- just keeps integrating
  • adapt = adaptation current
  • avg_ss = super-short term running average activation
  • avg_s = short-term running average activation, integrates over avg_ss, represents plus phase learning signal
  • avg_m = medium-term running average activation, integrates over avg_s, represents minus phase learning signal
  • avg_l = long-term running average activation, integrates over avg_m, drives long-term floating average for Hebbian learning
  • avg_l_lrn = how much to use the avg_l-based Hebbian learning for this receiving unit's learning -- in addition to the basic error-driven learning -- this is dynamically updated based on the avg_l factor, so that this Hebbian learning constraint can be stronger as a unit gets too active and needs to be regulated more strongly.
  • avg_s_eff = effective avg_s value used in learning -- includes a small fraction (.1) of the avg_m value, for reasons explained below.

Units are connected via synapses parameterized with the following variables. These are actually stored in an optimized vector format, but the LeabraCon object contains the variables as a template.

  • wt = net effective synaptic weight between objects -- subject to contrast enhancement compared to fwt and swt
  • dwt = delta-wt -- change in synaptic weights due to learning
  • fwt = fast weight -- used for advanced fast and slow weight learning dynamic -- otherwise equal to swt -- stored as non-contrast enhanced value
  • swt = slow weight -- standard learning rate weight -- stored as non-contrast enhanced value

LeabraCycle: Net input, Inhibition, Activation

For every cycle of activation updating, compute the net input, inhibition, membrane potential, and activation:

  • Net input (see LeabraUnitSpec.cpp for code):
  • net_raw += (sum over recv connections of:) scale_eff * act * wt
  • scale_eff = Leabra Netin Scaling factor that includes 1/N to compute an average, plus wt_scale.rel and abs relative and absolute scaling terms.
  • act = sending unit activation
  • wt = receiving connection weight value between sender and receiver
  • emergent does this very efficiently by using a sender-based computation, that only sends changes (deltas) in activation values -- typically only a few percent of neurons send on any given cycle.
  • net += dt.integ * dt.net_dt * (net_raw - net)
  • time integration of net input, using net_dt (1/1.4 default), and global integration time constant, dt.integ (1 = 1 msec default)
  • Inhibition (see LeabraLayerSpec.cpp for code):
  • ffi = ff * MAX(netin.avg - ff0, 0)
  • feedforward component of inhibition with ff multiplier -- has ff0 offset and can't be negative (that's what the MAX(.. ,0) part does).
  • netin.avg is average of net variable across unit group or layer, depending on what level this is being computed at (both are supported)
  • fbi += fb_dt * (fb * acts.avg - fbi)
  • feedback component of inhibition with fb multiplier -- requires time integration to dampen oscillations that otherwise occur -- fb_dt = 1/1.4 default
  • gc_i = gi * (ffi + fbi)
  • Membrane potential (see LeabraUnitSpec.cpp for code)
  • I_net = net * (e_rev.e - v_m) + gc_l * (e_rev.l - v_m) + gc_i * (e_rev.i - v_m) + noise
  • net current = sum of individual ionic channels: e = excitatory, l = leak (gc_l is a constant, 0.1 default), and i = inhibitory
  • e_rev are reversal potentials: in normalized values derived from biophysical values, e_rev.e = 1, .l = 0.3, i = 0.25
  • noise is typically gaussian if added
  • if ex: I_net += g_bar.l * exp_slope * exp((v_m - thr) / exp_slope)
  • this is the exponential component of AdEx, if in use (typically only for discrete spiking), exp_slope = .02 default
  • v_m += dt.integ * dt.vm_dt * (I_net - adapt)
  • in emergent, we use a simple midpoint method that evaluates v_m with a half-step time constant, and then uses this half-step v_m to compute full step in above I_net equation. vm_dt = 1/3.3 default.
  • v_m is always computed as in discrete spiking, even when using rate code, with v_m reset to vm_r etc -- this provides a more natural way to integrate adaptation and short-term plasticity mechanisms, which drive off of the discrete spiking.
  • v_m_eq += dt.integ * dt.vm_dt * (I_net - adapt)
  • the equilibrium version of the membrane potential does not reset with spikes, and is important for rate code per below
  • Activation (see LeabraUnitSpec.cpp for code)
  • g_e_thr = (gc_i * (e_rev_i - thr) + gc_l * (e_rev_l - thr) - adapt) / (thr - e_rev.e)
  • the amount of excitatory conductance required to put the neuron exactly at the firing threshold, thr = .5 default.
  • if(v_m > spk_thr) { spike = 1; v_m = vm_r } else { spike = 0 }
  • spk_thr is spiking threshold (1.2 default, different from rate code thr), vm_r = .3 is the reset value of the membrane potential after spiking -- we also have an optional refractory period after spiking, default = 3 cycles, where the vm equations are simply not computed, and vm remains at vm_r.
  • if using spiking mode, then act = spike, otherwise, rate code function is below
  • if(v_m_eq <= thr) { new_act = NXX1(v_m_eq - thr) } else { new_act = NXX1(net - g_e_thr) }
  • it is important that the time to first "spike" be governed by v_m integration dynamics, but after that point, it is essential that activation drive directly from the excitatory conductance (g_e or net) relative to the g_e_thr threshold -- activation rates are linear in this term, but not even a well-defined function of v_m_eq -- earlier versions of Leabra only used the v_m_eq-based term, and this led to some very strange behavior.
  • NXX1 = noisy-x-over-x+1 function, which is implemented using a lookup table due to the convolving of the XX1 function with a gaussian noise kernel
  • XX1(x) = gain * x / (gain * x + 1)
  • gain = 100 default
  • act_nd += dt.integ * dt.vm_dt * (new_act - act_nd)
  • non-depressed rate code activation is time-integrated using same vm_dt time constant as used in v_m, from the new activation value
  • act = act_nd * syn_tr (or just act_nd)
  • if short-term plasticity is in effect, then syn_tr variable reflects the synaptic transmission efficacy, and this product provides the net signal sent to the receiving neurons. otherwise syn_tr = 1.
  • adapt += dt.integ * (adapt.dt * (vm_gain * (v_m - e_rev.l) - adapt) + spike * spike_gain)
  • adaptation current -- causes rate of activation / spiking to decrease over time, adapt.dt = 1/144, vm_gain = 0.04, spike_gain = .00805 defaults


The XCAL dWt function, showing direction and magnitude of synaptic weight changes (dWt) as a function of the short-term average activity of the sending neuron (x) times the receiving neuron (y). This quantity is a simple mathematical approximation to the level of postsynaptic Ca++, reflecting the dependence of the NMDA channel on both sending and receiving neural activity. This function was extracted directly from the detailed biophysical Urakubo et al. (2008) model, by fitting a piecewise linear function to the synaptic weight change behavior that emerges from it as a function of a wide range of sending and receiving spiking patterns.

After iterating over some number of cycles, with typically the first set of 50 to 75 cycles being in the "minus phase" (no target values clamped on output layers, or other sources of plus phase signals as in LeabraTI), and the final 25 cycles being the "plus phase", where targets or other training signals are present, then learning equations are computed. These are based on running averages of activations as shown first.

  • Running averages computed continuously every cycle, and note the compounding form (see LeabraUnitSpec.cpp for code)
  • avg_ss += dt.integ * ss_dt * (act_nd - avg_ss)
  • super-short time scale running average, ss_dt = 1/2 default -- this was introduced to smooth out discrete spiking signal, but is also useful for rate code
  • avg_s += dt.integ * act_avg.s_dt * (avg_ss - avg_s)
  • short time scale running average, s_dt = 1/2 default -- this represents the "plus phase" or actual outcome signal in comparison to avg_m
  • avg_m += dt.integ * act_avg.m_dt * (avg_s - avg_m)
  • medium time-scale running average, m_dt = 1/10 average -- this represents the "minus phase" or expectation signal in comparison to avg_s
  • avg_l += (avg_m > 0.2) ? dt * (avg_l.max - avg_m) : dt * (avg_l.min - avg_m)
  • long-term running average -- this is computed just once per learning trial, not every cycle like the ones above -- max = 1.5, min = .1, dt = .1 by default
  • this is a basic exponential approach to max / min depending on whether unit is active or not -- having a clear max / min is important for the learning modulation shown next -- x ? y : z terminology is C syntax for if x is true, then y, else z
  • avg_l_lrn = avg_l.lrn_min + (avg_l - avg_l.min) * ((avg_l.lrn_max - avg_l.lrn_min) / avg_l.max - avg_l.min))
  • learning strength factor for how much to learn based on avg_l floating threshold -- this is dynamically modulated by strength of avg_l itself, and this turns out to be critical -- the amount of this learning increases as units are more consistently active all the time (i.e., "hog" units). avg_l.lrn_min = 0.005, avg_l.lrn_max = 0.05.
  • this depends on having a clear max to avg_l, which is why we use the exponential bounding form above.
  • avg_s_eff = m_in_s * avg_m + (1 - m_in_s) * avg_s
  • mix in some of the medium-term factor into the short-term factor -- this is important for ensuring that when neuron turns off in the plus phase (short term), that enough trace of earlier minus-phase activation remains to drive it into the LTD weight decrease region -- m_in_s = .1 default.
  • this is now done at the unit level -- previously was done at the connection level which is much less efficient!
  • Learning equation (see LeabraConSpec.h for code) -- most of these are intermediate variables used in computing final dwt value
  • srs = ru->avg_s_eff * su->avg_s_eff
  • short-term sender-receiver co-product -- this is the intracellular calcium from NMDA and other channels
  • srm = ru->avg_m * su->avg_m
  • medium-term sender-receiver co-product -- this drives dynamic threshold for error-driven learning
  • dwt += lrate * [ m_lrn * XCAL(srs, srm) + ru->avg_l_lrn * XCAL(srs, ru->avg_l)]
  • weight change is sum of two factors: error-driven based on medium-term threshold (srm), and BCM Hebbian based on long-term threshold of the recv unit (ru->avg_l)
  • in earlier versions, the two factors were combined into a single threshold value, using normalized weighting factors -- this was more elegant, but by separating the two apart, we allow the hebbian component to use the full range of the XCAL function (as compared to the relatively small avg_l_lrn factor applied inside the threshold computation). By multiplying by avg_l_lrn outside the XCAL equation, we get the desired contrast enhancement property of the XCAL function, where values close to the threshold are pushed either higher (above threshold) or lower (below threshold) most strongly, and values further away are less strongly impacted.
  • m_lrn is a constant and is typically 1.0 when error-driven learning is employed (but can be set to 0 to have a completely Hebbian model).
  • XCAL is the "check mark" linearized BCM-style learning function (see figure) that was derived from the Urakubo Et Al (2008) STDP model, as described in more detail in the CCN textbook: http://ccnbook.colorado.edu
  • XCAL(x, th) = (x < d_thr) ? 0 : (x > th * d_rev) ? (x - th) : (-x * ((1-d_rev)/d_rev))
  • d_thr = 0.0001, d_rev = 0.1 defaults
  • x ? y : z terminology is C syntax for: if x is true, then y, else z
  • Weight update equation (see LeabraConSpec.h for code) (see below for alternative version using fast weights)
  • The fwt value here is the linear, non-contrast enhanced version of the weight value, while wt is the sigmoidal contrast-enhanced version, which is used for sending netinput to other neurons. One can compute fwt from wt and vice-versa, but numerical errors can accumulate in going back-and forth more than necessary, and it is generally faster to just store these two weight values (and they are needed for the fast weights version show below).
  • dwt *= (dwt > 0) ? (1-fwt) : fwt
  • soft weight bounding -- weight increases exponentially decelerate toward upper bound of 1, and decreases toward lower bound of 0. based on linear, non-contrast enhanced fwt weights.
  • fwt += dwt
  • increment the linear weights with the bounded dwt term
  • wt = SIG(fwt)
  • new weight value is sigmoidal contrast enhanced version of fast weight
  • SIG(x) = 1 / (1 + (off * (1-w)/w)^gain)
  • dwt = 0
  • reset weight changes now that they have been applied.
  • Fast Weights version of the weight update equation -- this is optional and not widely used (as of yet)
  • dwt *= (dwt > 0) ? (1-fwt) : fwt
  • soft weight bounding -- weight increases exponentially decelerate toward upper bound of 1, and decreases toward lower bound of 0. based on fast weights (in linear, non-contrast enhanced form), so that fast weights can drive learning saturation if in effect.
  • fwt += dwt + decay_dt * (swt - fwt)
  • fast weights, if in effect, decay back to slow weights -- otherwise fwt = swt and so it just takes the increment from dwt
  • swt += dwt * slow_lrate
  • slow weights, if using fast weights, learn at a slower learning rate based on same dwt term -- otherwise swt = fwt
  • nwt = SIG(fwt)
  • new weight value is sigmoidal contrast enhanced version of fast weight (nwt is not stored -- just a tmp variable)
  • SIG(x) = 1 / (1 + (off * (1-w)/w)^gain)
  • wt += wt_dt * (nwt - wt)
  • weight moves toward new weight value with a time constant -- reflects slowed rise time seen in early LTP dynamics -- if not using fast weights, then wt = nwt directly (as if wt_dt = 1)
  • dwt = 0
  • reset weight changes now that they have been applied.

Specific Leabra Object Information

The following provide more details for the main class objects used in Leabra.

Network structure


As discussed in Specs, the specs contain all the parameters and algorithm-specific code, while the above objects contain the dynamic state information.

Special Algorithms

See LeabraWizard for special functions for configuring several of these special architectural features.

Simple Recurrent Network Context Layer

Scalar, Graded Value Representation

  • ScalarValLayerSpec -- encodes and decodes scalar, real-numbered values based on a coarse coded distributed representation (e.g., a Gaussian bump) across multiple units. This provides a very efficient and effective way of representing scalar values -- individual Leabra units do not do a very good job of that, as they have a strong binary bias.
  • TwoDValLayerSpec -- two-dimensional version of scalar val
  • MotorForceLayerSpec -- represents motor output and input forces in a distributed manner across unit groups representing position and velocity.

Temporal Differences and General Da (dopamine) Modulation

Temporal differences (TD) is widely used as a model of midbrain dopaminergic firing. Also included are Leabra Units that respond to simulated dopaminergic modulation (DaMod)

See Leabra TD for details.

PVLV -- Pavlovian Conditioning

Simulates behavioral and neural data on Pavlovian conditioning and the midbrain dopaminergic neurons that fire in proportion to unexpected rewards (an alternative to TD). It is described in these papers: O'Reilly, Frank, Hazy & Watz, 2007 Hazy, Frank & O'Reilly, 2010. The current version (described here) is as described in the 2010 paper. A PVLV model can be made through the LeabraWizard -- under Networks menu.

See Leabra PVLV for full details.

PBWM -- Prefrontal Cortex Basal Ganglia Working Memory

Uses PVLV to train PFC working memory updating system, based on the biology of the prefrontal cortex and basal ganglia. For complete details, see O'Reilly and Frank, 2006.

Described in Leabra PBWM.

Other Misc Classes

  • MarkerConSpec -- a "null" connection that doesn't do anything, but serves as a marker for special computations (e.g., the temporal derivative computation instead of standard net input).
  • LeabraLinUnitSpec -- linear activation function
  • LeabraNegBiasSpec -- negative bias learning (for refractory neurons)
  • TrialSynDepConSpec, TrialSynDepCon -- synaptic depression on a trial-wise time scale
  • FastWtConSpec, FastWtCon -- transient fast weights in addition to standard slowly adapting weights
  • ActAvgHebbConSpec -- hebbian learning includes proportion of time-averaged activation (as in the "trace" rule)

Parameter Setting Tips and Tricks

See Leabra Params for detailed discussion about adjusting parameters.

Software Version Update Issues