Leabra

From emergent
Jump to: navigation, search

Introduction

Leabra stands for Local, Error-driven and Associative, Biologically Realistic Algorithm, and it implements a balance between associative (Hebbian) and error-driven learning on top of a biologically-based point-neuron activation function with inhibitory competition dynamics (either via inhibitory interneurons or an approximation thereof), which produce k-Winners-Take-All (kWTA) sparse distributed representations. Extensive documentation is available from the online textbook: http://ccnbook.colorado.edu which serves as a second edition to the original book: Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain, O'Reilly and Munakata, 2000, Cambridge, MA: MIT Press. (Computational Explorations..)

As of November 3, 2014: Please see Leabra 8.0 Update for important information about updating to the 8.0 version of emergent (still soon to be released as of 1/2016). See emergent7 for continuing support for version 7.0. The documentation below describes the 8.0 version.

Overview

The basic activation dynamics of Leabra are based on standard electrophysiological principles of real neurons, and in discrete spiking mode we implement exactly the AdEx (adapting exponential) model of Gerstner and colleagues (Scholarpedia article on AdEx). The rate code mode (which runs faster and allows for smaller networks) provides a very close approximation to the AdEx model behavior, in terms of a graded activation signal matching the actual instantaneous rate of spiking across a population of AdEx neurons. We generally conceive of a single rate-code neuron as representing a microcolumn of roughly 100 spiking pyramidal neurons in the neocortex. There are also short-term plasticity dynamics that produce synaptic depression and potentiation, as described in Hennig (2013).

The excitatory synaptic input conductances (i.e., net input) is computed as an average, not a sum, over connections, based on normalized, sigmoidaly transformed weight values, which are subject to scaling on a connection-group level to alter relative contributions. Automatic scaling is performed to compensate for differences in expected activity level in the different projections. See Leabra Netin Scaling for details.

Inhibition is computed using a feed-forward (FF) and feed-back (FB) inhibition function (FFFB) that closely approximates the behavior of inhibitory interneurons in the neocortex. FF is based on a multiplicative factor applied to the average net input coming into a layer, and FB is based on a multiplicative factor applied to the average activation within the layer. These simple linear functions do an excellent job of controlling the overall activation levels in bidirectionally connected networks, producing behavior very similar to the more abstract computational implementation of kWTA dynamics implemented in previous versions.

There is a single learning equation, derived from a very detailed model of spike timing dependent plasticity (STDP) by Urakubo et al, 2008, that produces a combination of Hebbian associative and error-driven learning. For historical reasons, we call this the XCAL equation (eXtended Contrastive Attractor Learning), and it is functionally very similar to the BCM learning rule developed by Bienenstock, Cooper & Munro (1982). The essential learning dynamic involves a Hebbian co-product of sending neuron activation times receiving neuron activation, which biologically reflects the amount of calcium entering through NMDA channels, and this co-product is then compared against a floating threshold value. To produce the Hebbian learning dynamic, this floating threshold is based on a long-term running average of the receiving neuron activation. This is the key idea for the BCM algorithm. To produce error-driven learning, the floating threshold is based on a much faster running average of activation co-products, which reflects an expectation or prediction, against which the instantaneous, later outcome is compared.

Weights are subject to a contrast enhancement function, which compensates for the soft (exponential) weight bounding that keeps weights within the normalized 0-1 range. Contrast enhancement is important for enhancing the selectivity of self-organizing learning, and generally results in faster learning with better overall results. Learning operates on the underlying internal linear weight value. Biologically, we associate the underlying linear weight value with internal synaptic factors such as actin scaffolding, CaMKII phosphorlation level, etc, while the contrast enhancement operates at the level of AMPA receptor expression. We also support (optionally) a distinction between fast and slow weights -- biologically fast weights reflect the early LTP dynamics, which rise and fall over roughly 15 minutes after induction, while slow weights reflect the stable long-term weight values that the fast weights decay back to.

There are various extensions to the algorithm that implement special neural mechanisms associated with the prefrontal cortex and basal ganglia (PBWM), dopamine systems PVLV, the Hippocampus, and temporal integration dynamics associated with the thalamocortical circuits (LeabraTI, and CIFER).

Leabra Algorithm Equations

The pseudocode for Leabra is given here, showing exactly how the pieces of the algorithm fit together, using the equations and variables from the actual code. The emergent implementation contains a number of optimizations (including vectorization and GPU code), but this provides the core math in simple form.

See the Matlab directory in the emergent svn source directory for a complete implementation of these equations in Matlab, coded by Sergio Verduzco-Flores -- this can be a lot simpler to read than the highly optimized C++ source code.

Variables

LeabraUnits are organized into LeabraLayers, which sometimes have unit groups (which are now typically purely virtual, not actual Unit_Group objects). The LeabraUnit has the following key parameters, along with a number of others that are used for other non-default algorithms and various optimizations, etc.

  • act = activation sent to other units
  • act_nd = non-depressed activation -- prior to application of any short-term plasticity
  • net_raw = raw netinput, prior to time-averaging
  • net = time-averaged excitatory conductance (net input)
  • gc_i = inhibitory conductance, computed from FFFB inhibition function typically
  • I_net = net current, combining excitatory, inhibitory, and leak channels
  • v_m = membrane potential
  • v_m_eq = equilibrium membrane potential -- not reset by spikes -- just keeps integrating
  • adapt = adaptation current
  • avg_ss = super-short term running average activation
  • avg_s = short-term running average activation, integrates over avg_ss, represents plus phase learning signal
  • avg_m = medium-term running average activation, integrates over avg_s, represents minus phase learning signal
  • avg_l = long-term running average activation, integrates over avg_m, drives long-term floating average for Hebbian learning
  • avg_l_lrn = how much to use the avg_l-based Hebbian learning for this receiving unit's learning -- in addition to the basic error-driven learning -- this can optionally be dynamically updated based on the avg_l factor and average level of error in the receiving layer, so that this Hebbian learning constraint can be stronger as a unit gets too active and needs to be regulated more strongly, and in proportion to average error levels in the layer.
  • avg_s_eff = effective avg_s value used in learning -- includes a small fraction (.1) of the avg_m value, for reasons explained below.

Units are connected via synapses parameterized with the following variables. These are actually stored in an optimized vector format, but the LeabraCon object contains the variables as a template.

  • wt = net effective synaptic weight between objects -- subject to contrast enhancement compared to fwt and swt
  • dwt = delta-wt -- change in synaptic weights due to learning
  • fwt = fast weight -- used for advanced fast and slow weight learning dynamic -- otherwise equal to swt -- stored as non-contrast enhanced value
  • swt = slow weight -- standard learning rate weight -- stored as non-contrast enhanced value

LeabraCycle: Net input, Inhibition, Activation

For every cycle of activation updating, compute the net input, inhibition, membrane potential, and activation:

  • Net input (see LeabraUnitSpec.cpp for code):
  • net_raw += (sum over recv connections of:) scale_eff * act * wt
  • scale_eff = Leabra Netin Scaling factor that includes 1/N to compute an average, plus wt_scale.rel and abs relative and absolute scaling terms.
  • act = sending unit activation
  • wt = receiving connection weight value between sender and receiver
  • emergent does this very efficiently by using a sender-based computation, that only sends changes (deltas) in activation values -- typically only a few percent of neurons send on any given cycle.
  • net += dt.integ * dt.net_dt * (net_raw - net)
  • time integration of net input, using net_dt (1/1.4 default), and global integration time constant, dt.integ (1 = 1 msec default)
  • Inhibition (see LeabraLayerSpec.cpp for code):
  • ffi = ff * MAX(netin.avg - ff0, 0)
  • feedforward component of inhibition with ff multiplier (1 by default) -- has ff0 offset and can't be negative (that's what the MAX(.. ,0) part does).
  • netin.avg is average of net variable across unit group or layer, depending on what level this is being computed at (both are supported)
  • fbi += fb_dt * (fb * acts.avg - fbi)
  • feedback component of inhibition with fb multiplier (1 by default) -- requires time integration to dampen oscillations that otherwise occur -- fb_dt = 1/1.4 default
  • gc_i = gi * (ffi + fbi)
  • total inhibitory conductance, with global gi multiplier -- default of gi=1.8 typically produces good sparse distributed representations in reasonably large layers (25 units or more)
  • Membrane potential (see LeabraUnitSpec.cpp for code)
  • I_net = net * (e_rev.e - v_m) + gc_l * (e_rev.l - v_m) + gc_i * (e_rev.i - v_m) + noise
  • net current = sum of individual ionic channels: e = excitatory, l = leak (gc_l is a constant, 0.1 default), and i = inhibitory
  • e_rev are reversal potentials: in normalized values derived from biophysical values, e_rev.e = 1, .l = 0.3, i = 0.25
  • noise is typically gaussian if added
  • if ex: I_net += g_bar.l * exp_slope * exp((v_m - thr) / exp_slope)
  • this is the exponential component of AdEx, if in use (typically only for discrete spiking), exp_slope = .02 default
  • v_m += dt.integ * dt.vm_dt * (I_net - adapt)
  • in emergent, we use a simple midpoint method that evaluates v_m with a half-step time constant, and then uses this half-step v_m to compute full step in above I_net equation. vm_dt = 1/3.3 default.
  • v_m is always computed as in discrete spiking, even when using rate code, with v_m reset to vm_r etc -- this provides a more natural way to integrate adaptation and short-term plasticity mechanisms, which drive off of the discrete spiking.
  • v_m_eq += dt.integ * dt.vm_dt * (I_net - adapt)
  • the equilibrium version of the membrane potential does not reset with spikes, and is important for rate code per below
  • Activation (see LeabraUnitSpec.cpp for code)
  • g_e_thr = (gc_i * (e_rev_i - thr) + gc_l * (e_rev_l - thr) - adapt) / (thr - e_rev.e)
  • the amount of excitatory conductance required to put the neuron exactly at the firing threshold, thr = .5 default.
  • if(v_m > spk_thr) { spike = 1; v_m = vm_r } else { spike = 0 }
  • spk_thr is spiking threshold (1.2 default, different from rate code thr), vm_r = .3 is the reset value of the membrane potential after spiking -- we also have an optional refractory period after spiking, default = 3 cycles, where the vm equations are simply not computed, and vm remains at vm_r.
  • if using spiking mode, then act = spike, otherwise, rate code function is below
  • if(v_m_eq <= thr) { new_act = NXX1(v_m_eq - thr) } else { new_act = NXX1(net - g_e_thr) }
  • it is important that the time to first "spike" be governed by v_m integration dynamics, but after that point, it is essential that activation drive directly from the excitatory conductance (g_e or net) relative to the g_e_thr threshold -- activation rates are linear in this term, but not even a well-defined function of v_m_eq -- earlier versions of Leabra only used the v_m_eq-based term, and this led to some very strange behavior.
  • NXX1 = noisy-x-over-x+1 function, which is implemented using a lookup table due to the convolving of the XX1 function with a gaussian noise kernel
  • XX1(x) = gain * x / (gain * x + 1)
  • gain = 100 default
  • act_nd += dt.integ * dt.vm_dt * (new_act - act_nd)
  • non-depressed rate code activation is time-integrated using same vm_dt time constant as used in v_m, from the new activation value
  • act = act_nd * syn_tr (or just act_nd)
  • if short-term plasticity is in effect, then syn_tr variable reflects the synaptic transmission efficacy, and this product provides the net signal sent to the receiving neurons. otherwise syn_tr = 1.
  • adapt += dt.integ * (adapt.dt * (vm_gain * (v_m - e_rev.l) - adapt) + spike * spike_gain)
  • adaptation current -- causes rate of activation / spiking to decrease over time, adapt.dt = 1/144, vm_gain = 0.04, spike_gain = .00805 defaults

Learning

The XCAL dWt function, showing direction and magnitude of synaptic weight changes (dWt) as a function of the short-term average activity of the sending neuron (x) times the receiving neuron (y). This quantity is a simple mathematical approximation to the level of postsynaptic Ca++, reflecting the dependence of the NMDA channel on both sending and receiving neural activity. This function was extracted directly from the detailed biophysical Urakubo et al. (2008) model, by fitting a piecewise linear function to the synaptic weight change behavior that emerges from it as a function of a wide range of sending and receiving spiking patterns.

Learning is organized around the following timing, based on an internally-generated alpha-frequency (10 Hz, 100 msec periods) cycle of expectation followed by outcome, supported by neocortical circuitry in the deep layers and the thalamus, as hypothesized in the DeepLeabra extension to standard Leabra:

  • A Trial lasts 100 msec (10 Hz, alpha frequency), and comprises one sequence of expectation -- outcome learning, organized into 4 quarters.
  • Biologically, the deep neocortical layers (layers 5, 6) and the thalamus have a natural oscillatory rhythm at the alpha frequency. Specific dynamics in these layers organize the cycle of expectation vs. outcome within the alpha cycle.
  • A Quarter lasts 25 msec (40 Hz, gamma frequency) -- the first 3 quarters (75 msec) form the expectation / minus phase, and the final quarter are the outcome / plus phase.
  • Biologically, the superficial neocortical layers (layers 2, 3) have a gamma frequency oscillation, supporting the quarter-level organization.
  • A Cycle represents 1 msec of processing, where each neuron updates its membrane potential etc according to the above equations.

Learning is based on running-averages of activation variables, described first:

  • Running averages computed continuously every cycle, and note the compounding form (see LeabraUnitSpec.cpp for code)
  • avg_ss += dt.integ * ss_dt * (act_nd - avg_ss)
  • super-short time scale running average, ss_dt = 1/2 default -- this was introduced to smooth out discrete spiking signal, but is also useful for rate code
  • avg_s += dt.integ * act_avg.s_dt * (avg_ss - avg_s)
  • short time scale running average, s_dt = 1/2 default -- this represents the "plus phase" or actual outcome signal in comparison to avg_m
  • avg_m += dt.integ * act_avg.m_dt * (avg_s - avg_m)
  • medium time-scale running average, m_dt = 1/10 average -- this represents the "minus phase" or expectation signal in comparison to avg_s
  • avg_l += (avg_m > 0.2) ? dt * (avg_l.max - avg_m) : dt * (avg_l.min - avg_m)
  • long-term running average -- this is computed just once per learning trial, not every cycle like the ones above -- max = 1.5, min = .1, dt = .1 by default
  • this is a basic exponential approach to max / min depending on whether unit is active or not -- having a clear max / min is important for the learning modulation shown next -- x ? y : z terminology is C syntax for if x is true, then y, else z
  • avg_s_eff = m_in_s * avg_m + (1 - m_in_s) * avg_s
  • mix in some of the medium-term factor into the short-term factor -- this is important for ensuring that when neuron turns off in the plus phase (short term), that enough trace of earlier minus-phase activation remains to drive it into the LTD weight decrease region -- m_in_s = .1 default.
  • this is now done at the unit level -- previously was done at the connection level which is much less efficient!
  • Optional: dynamic modulation of amount of Hebbian learning, based on avg_l value and level of err in a given layer -- these factors make a small (few percent) but reliable difference in overall performance across various challenging tasks -- they can readily be omitted in favor of a fixed avg_l_lrn factor of around 0.0004 (with 0 for target layers -- it doesn't make sense to have any Hebbian learning at output layers):
  • avg_l_lrn = avg_l.lrn_min + (avg_l - avg_l.min) * ((avg_l.lrn_max - avg_l.lrn_min) / avg_l.max - avg_l.min))
  • learning strength factor for how much to learn based on avg_l floating threshold -- this is dynamically modulated by strength of avg_l itself, and this turns out to be critical -- the amount of this learning increases as units are more consistently active all the time (i.e., "hog" units). avg_l.lrn_min = 0.005, avg_l.lrn_max = 0.05. Note that this depends on having a clear max to avg_l, which is an advantage of the exponential bounding form above.
  • avg_l_lrn *= MAX(1 - cos_diff_avg, 0.01)
  • also modulate by time-averaged cosine (normalized dot product) between minus and plus phase activation states in given receiving layer (cos_diff_avg), (time constant 100) -- if error signals are small in a given layer, then Hebbian learning should also be relatively weak so that it doesn't overpower it -- and conversely, layers with higher levels of error signals can handle (and benefit from) more Hebbian learning. The MAX(0.01) factor ensures that there is a minimum level of .01 Hebbian (multiplying the previously-computed factor above). The .01 * .05 factors give an upper-level value of .0005 to use for a fixed constant avg_l_lrn value -- just slightly less than this (.0004) seems to work best if not using these adaptive factors.
  • Learning equation (see LeabraConSpec.h for code) -- most of these are intermediate variables used in computing final dwt value
  • srs = ru->avg_s_eff * su->avg_s_eff
  • short-term sender-receiver co-product -- this is the intracellular calcium from NMDA and other channels
  • srm = ru->avg_m * su->avg_m
  • medium-term sender-receiver co-product -- this drives dynamic threshold for error-driven learning
  • dwt += lrate * [ m_lrn * XCAL(srs, srm) + ru->avg_l_lrn * XCAL(srs, ru->avg_l)]
  • weight change is sum of two factors: error-driven based on medium-term threshold (srm), and BCM Hebbian based on long-term threshold of the recv unit (ru->avg_l)
  • in earlier versions, the two factors were combined into a single threshold value, using normalized weighting factors -- this was more elegant, but by separating the two apart, we allow the hebbian component to use the full range of the XCAL function (as compared to the relatively small avg_l_lrn factor applied inside the threshold computation). By multiplying by avg_l_lrn outside the XCAL equation, we get the desired contrast enhancement property of the XCAL function, where values close to the threshold are pushed either higher (above threshold) or lower (below threshold) most strongly, and values further away are less strongly impacted.
  • m_lrn is a constant and is typically 1.0 when error-driven learning is employed (but can be set to 0 to have a completely Hebbian model).
  • XCAL is the "check mark" linearized BCM-style learning function (see figure) that was derived from the Urakubo Et Al (2008) STDP model, as described in more detail in the CCN textbook: http://ccnbook.colorado.edu
  • XCAL(x, th) = (x < d_thr) ? 0 : (x > th * d_rev) ? (x - th) : (-x * ((1-d_rev)/d_rev))
  • d_thr = 0.0001, d_rev = 0.1 defaults
  • x ? y : z terminology is C syntax for: if x is true, then y, else z
  • Weight update equation (see LeabraConSpec.h for code) (see below for alternative version using fast weights)
  • The fwt value here is the linear, non-contrast enhanced version of the weight value, while wt is the sigmoidal contrast-enhanced version, which is used for sending netinput to other neurons. One can compute fwt from wt and vice-versa, but numerical errors can accumulate in going back-and forth more than necessary, and it is generally faster to just store these two weight values (and they are needed for the slow vs. fast weights version show below).
  • dwt *= (dwt > 0) ? (1-fwt) : fwt
  • soft weight bounding -- weight increases exponentially decelerate toward upper bound of 1, and decreases toward lower bound of 0. based on linear, non-contrast enhanced fwt weights.
  • fwt += dwt
  • increment the linear weights with the bounded dwt term
  • wt = SIG(fwt)
  • new weight value is sigmoidal contrast enhanced version of fast weight
  • SIG(x) = 1 / (1 + (off * (1-w)/w)^gain)
  • dwt = 0
  • reset weight changes now that they have been applied.
  • Optional: Slow vs. Fast Weights version of the weight update equation -- this is not widely used (as of yet), but enables rapid but more transient learning to coexist with slower more enduring learning at each synapse, which can have important behavioral implications:
  • dwt *= (dwt > 0) ? (1-fwt) : fwt
  • soft weight bounding -- weight increases exponentially decelerate toward upper bound of 1, and decreases toward lower bound of 0. based on fast weights (in linear, non-contrast enhanced form), so that fast weights can drive learning saturation if in effect.
  • fwt += dwt
  • fast weights learn according to standard learning mechanism -- to make them faster, turn up the std learning rate -- important to not change the basic learning rule here by doing something else with the fast weights
  • eff_wt = swt_pct * swt + (1 - swt_pct) * fwt
  • effective weight value is blend of slow weights and fast weights -- swt_pct can be .5 or .8 typically (eff_wt is not stored -- just a tmp variable)
  • nwt = SIG(eff_wt)
  • new weight value is sigmoidal contrast enhanced version of effective weight (nwt is not stored -- just a tmp variable)
  • SIG(x) = 1 / (1 + (off * (1-w)/w)^gain)
  • wt += wt_dt * (nwt - wt)
  • weight moves toward new weight value with a time constant -- reflects slowed rise time seen in early LTP dynamics
  • swt += slow_dt * (fwt - swt)
  • slow weight moves slowly toward fast weights slow_dt = 1/100 typically but can be as long as 1/3000
  • dwt = 0
  • reset weight changes now that they have been applied.
  • Optional: Weight Balance -- this option attempts to maintain more balanced weights across units, to prevent some units from hogging the representational space, by changing the rates of weight increase and decrease in the soft weight bounding function, as a function of the average receiving weights:
  • dwt *= (dwt > 0) ? wb_inc * (1-fwt) : wb_dec * fwt
  • wb_inc = weight increase modulator, and wb_dec = weight decrease modulator (when these are both 1, this is same as standard, and this is the default value of these factors)
  • wt_avg = <wt>
  • average of all the receiving weights -- computed per projection (corresponding to a dendritic branch perhaps)
  • if (wt_avg > hi_thr) then wbi = gain * (wt_avg - hi_thr); wb_inc = 1 - wbi; wb_dec = 1 + wbi
  • If the average weights are higher than a high threshold (hi_thr = .4 default) then the increase factor wb_inc is reduced, and the decrease factor wb_dec is increased, by a factor wbi that is determined by how far above the threshold the average is. Thus, the higher the weights get, the less quickly they can increase, and the more quickly they decrease, pushing them back into balance.
  • if (wt_avg < lo_thr) then wbd = gain * (wt_avg - lo_thr); wb_inc = 1 - wbd; wb_dec = 1 + wbd
  • This is the symmetric version for case when weight averages are below a low threshold (lo_thr = .2), and the weight balance factors go in the opposite direction (wbd is negative), causing weight increases to be favored over decreases.
  • The hi_thr and lo_thr parameters are specified in terms of a target weight average value trg = .3 with a threshold thr=.1 around that target value, with these defaults producing the default .4 and .2 hi and lo thresholds respectively.
  • A key feature of this mechanism is that it does not change the sign of any weight changes, including not causing weights to change that are otherwise not changing due to the learning rule. This is not true of an alternative mechanism that has been used in various models, which normalizes the total weight value by subtracting the average. Overall this weight balance mechanism is important for larger networks on harder tasks, where the hogging problem can be a significant problem.

Specific Leabra Object Information

The following provide more details for the main class objects used in Leabra.

Network structure

Specs

As discussed in Specs, the specs contain all the parameters and algorithm-specific code, while the above objects contain the dynamic state information.

Special Algorithms

See LeabraWizard for special functions for configuring several of these special architectural features.

Simple Recurrent Network Context Layer

Scalar, Graded Value Representation

  • ScalarValLayerSpec -- encodes and decodes scalar, real-numbered values based on a coarse coded distributed representation (e.g., a Gaussian bump) across multiple units. This provides a very efficient and effective way of representing scalar values -- individual Leabra units do not do a very good job of that, as they have a strong binary bias.
  • TwoDValLayerSpec -- two-dimensional version of scalar val
  • MotorForceLayerSpec -- represents motor output and input forces in a distributed manner across unit groups representing position and velocity.

Temporal Differences and General Da (dopamine) Modulation

Temporal differences (TD) is widely used as a model of midbrain dopaminergic firing. Also included are Leabra Units that respond to simulated dopaminergic modulation (DaMod)

See Leabra TD for details.

PVLV -- Pavlovian Conditioning

Simulates behavioral and neural data on Pavlovian conditioning and the midbrain dopaminergic neurons that fire in proportion to unexpected rewards (an alternative to TD). It is described in these papers: O'Reilly, Frank, Hazy & Watz, 2007 Hazy, Frank & O'Reilly, 2010. The current version (described here) is as described in the 2010 paper. A PVLV model can be made through the LeabraWizard -- under Networks menu.

See Leabra PVLV for full details.

PBWM -- Prefrontal Cortex Basal Ganglia Working Memory

Uses PVLV to train PFC working memory updating system, based on the biology of the prefrontal cortex and basal ganglia. For complete details, see O'Reilly and Frank, 2006.

Described in Leabra PBWM.

Other Misc Classes

  • MarkerConSpec -- a "null" connection that doesn't do anything, but serves as a marker for special computations (e.g., the temporal derivative computation instead of standard net input).
  • LeabraLinUnitSpec -- linear activation function
  • LeabraNegBiasSpec -- negative bias learning (for refractory neurons)
  • TrialSynDepConSpec, TrialSynDepCon -- synaptic depression on a trial-wise time scale
  • FastWtConSpec, FastWtCon -- transient fast weights in addition to standard slowly adapting weights
  • ActAvgHebbConSpec -- hebbian learning includes proportion of time-averaged activation (as in the "trace" rule)

Parameter Setting Tips and Tricks

See Leabra Params for detailed discussion about adjusting parameters.

Software Version Update Issues