Back to Leabra.
Uses PVLV to train PFC working memory updating system, based on the biology of the prefrontal cortex and basal ganglia. See O'Reilly and Frank, 2006 for the original paper on this idea. A new paper on the latest version of this framework, Version 3, now included in emergent, is forthcoming.
- First, there are multiple separate stripes (groups of units) in the PFC and Basal Ganglia (Matrix) layers. Each stripe can be independently updated, such that this system can remember several different things at the same time, each with a different "updating policy" of when memories are updated and maintained. The active maintenance of the memory is in PFC, and the updating signals (and updating policy more generally) come from the Matrix units (a subset of basal ganglia units).
- PVLV provides reinforcement learning signals to train up the dynamic gating system in the basal ganglia.
- Matrix is the dynamic gating system representing the matrix units of the basal ganglia. There are separate "Go" units that all compete amongst each other for which stripe(s) will get a gating signal on this trial. The "NoGo" units separately compete to provide an override signal to selectively sculpt the Go pathway firing -- NoGo acts to inhibit corresponding stripes in the Go pathway. Go firing causes updating of the PFC layers, meaning that the superficial layer activation in PFCs can flow into the deep layers (PFCd).
- There are three possible types of PFC stripes -- input, maintenance, and output.
- Gating in Input (PFCx_in) supports a decision about what sensory information to process further in the PFC, via PFCd_in representations. The PFCd_in activity is not persistent. IMPORTANT: This is currently NOT supported or tested, although the code is all there.
- Gating in Maintenance (PFCx_mnt) causes the PFCd_mnt units to open persistent maintenance currents that support maintenance of the PFCd_mnt representations over extended periods of trails.
- Gating in Output (PFCx_out) causes activity from PFCd_mnt -> PFCs_out to then flow into PFCd_out, which is what then leads to generation of a motor response or other use of this information in the rest of the brain.
- SNrThal represents the substantia nigra pars reticulata (SNr) and the associated area of the thalamus, which produce a competition among the Go units within a given stripe. If there is more overall Go activity in a given stripe, then the associated SNrThal unit gets activated, and it drives updating in PFC. The kwta.k parameter here is critical for determining how many stripes can gate at a time. Note that typically at least one stripe will gate per trial, but the type of stripe (input, maintenance, output) can vary as they all compete amongst each other.
- 1 Converting from Older Versions of PBWM
- 2 Overview
Converting from Older Versions of PBWM
You have to use the Wizard Remove PBWM function to remove the old version of the layers/specs and then PBWM to create the new version -- the differences in structure are too big to convert in any other way.
The following overview of PBWM V3 is available prior to the publication of the full model. It focuses in large part on differences from the previous version, V2, for those who were familiar with that version. This version remained unpublished, and actually differed in several ways from the original V1 that was published in O'Reilly and Frank, 2006. Research is ever a moving target..
At a computational level, V3 embraces the Simple Recurrent Network (SRN) paradigm for dealing with time, originally invented by Elman and Jordan in the 1980's. The key advantage of an SRN is that it neatly separates the maintenance of prior information (in a maintained context layer) from the integration of this prior information with new input, which occurs in a standard hidden layer. After every trial, the current hidden layer activities are copied to the context layer, and the hidden layer uses this memory to contextualize its processing of new inputs.
There is a fundamental tradeoff between maintaining the old vs. integrating the new, which creates tension in everything from politics to working memory. Some approaches just stick a parameter on this problem and attempt to dial in a specific point on the tradeoff (e.g., Grossberg's ART model), whereas the SRN framework allows learning to adjudicate how much, and in exactly what way, old information integrates with new information, resulting in much more "intelligent" and context-specific temporal integration. The fully labile hidden layer is shaped by error-driven learning to determine how to integrate new inputs with prior context, and there are modifiable synaptic weights that allow a high-dimensional, content-dependent integration of this prior context -- it is not just a simple unit-wise decision of how much to retain vs. update to new state. This allows temporal integration to take place in high-dimensional coarse-coded distributed representation space, not in low-dimensional unit-by-unit space. Furthermore, the retention of prior information in the context represents a perfect maintenance of the immediate past, so no information is necessarily lost, as it would be if there was a simple tradeoff parameter.
The main limitation of the SRN has been that it has a hard time maintaining information over multiple trials, especially when there is no error-driven learning pressure to maintain over an intervening gap. Using a BG-based gating system to decide when to update the context representations avoids this problem, by allowing some stripes to maintain their frozen context over multiple trials, in a way that is determined by success-driven reinforcement learning in the BG. Thus, in summary, V3 represents at a computational level a "back-to-the-future" move of leveraging the power of error-driven representational learning in the SRN, combined with the basic PBWM idea of selective gating mediated by the BG, to enable longer-term maintenance of some information while rapidly updating other information.
To contrast this SRN model with prior PBWM versions -- the PFC layers before were subject to the strong old vs. new tradeoff dynamic within a single PFC layer, where one set of units was either maintaining old information or labile and reflecting new information. You could not have both. This required complex double-gating dynamics to toggle off the old information, enable new information to come in, and then be maintained. Also, it did not provide a very clear signal to other layers about what information was old vs. new information.
At the biological level, the SRN is likely an extreme caricature of what is actually going on in the brain, but it is useful to retain it for at least the time being -- the bifurcation of hidden and context layers has some important advantages for the reinforcement learning in the BG as well, so there are multiple reasons to think that something like this must be happening, even if not quite as starkly.
Biologically, the superficial layers in the PFC (PFCs) play the role of the labile hidden layer, while the deep layers (PFCd) play the role of the maintained context layer. The PFCs layers integrate new input from posterior cortical projections, together with context from the maintained context information in PFCd. The PFCd layers copy activation from PFCs only when they receive a Go gating signal from the BG -- otherwise they maintain whatever information they were previously maintaining. There is also a maintenance decay parameter to allow for the PFCd context to decay slowly over time.
PFC areas are divided into 3 main functional categories, as seen in electrophysiological recordings:
- INPUT (PFCs_in, PFCd_in) -- encodes new sensory inputs -- gating here represents a form of selective attention for only relevant inputs (as determined by BG gating) -- deep layer activity is automatically reset after the end of the trial (transient). Other layers listening to the PFCd_in layer will get a filtered view of the full range of sensory inputs.
- MAINT (PFCs_mnt, PFCd_mnt) -- maintains information over multiple trials -- gating toggles on the deep layer activity, which is then maintained over multiple trials until a new update occurs, or the maintenance dies away from decay over time.
- OUTPUT (PFCs_out, PFCd_out) -- receives from MAINT deep layers (PFCd_mnt), and provides selective output attention for determining when and what to activate and send to other layers, via gated PFCd_out activity.
Despite these functional differences, the BG gating signal has the same effects in all cases: during the gating trial, deep activations update cycle-by-cycle to follow the dynamics of the superficial layer -- this may be important for learning. For INPUT and OUTPUT layers, this deep layer activity is then cleared at the end of the trial. For MAINT, it persists. That's all there is to it.
One important difference between V3 and V2 is that in V2 the OUTPUT PFC was in one-to-one unit-wise correspondence to the MAINT PFC (much as superficial and deep are in V3, and indeed that was the functional role assignment to these layers in V2), whereas, consistent with the biology, the OUTPUT PFC stripes are just another set of PFC stripes in V3, with no special relationship to any corresponding MAINT stripes. This means that some pretraining is required to establish the coordinated encoding and decoding of information in the MAINT and OUTPUT layers.
It also means that the controversial out_go_clear mechanism from V2 is no longer tenable -- it is not obvious which MAINT layer you would clear when a given OUTPUT layer fires Go. In practice, it seems to work fine to just not clear the maintenance at all. More realistic models should include a hierarchy of PFC areas where the higher-level area clears activity in the lower-level area based on completion of a sub-task -- this is an important topic for ongoing research. As elaborated below, out_go_clear is much less important in V3 for several reasons, including the basic robustness of the SRN dynamic compared to the double-gating behavior in V2 -- the PFCs layers always represent an integration of new and old information, so clearing the old is less urgent. Also, there is much less dependency on clearing in terms of the BG gating biases, although this is still present to some extent.
The BG dynamics are reorganized in important ways in V3, which avoid the main problems encountered in V2, while capturing the biases introduced in V2 in more natural, architectural ways. V3 provides an exciting new opportunity to fully reconceptualize the role of the NoGo (indirect) pathway -- it plays a very different role than in all previous PBWM models, and our understanding of this role is still developing. The major gating action is now much more focused on the Go (direct) pathway, which we describe first.
Based on the notion of a "matrisome complex" in the biological data, the BG gating for all three of the different PFC types is intermixed, and mutually competing. Furthermore, the fundamental dynamic is that all the Go pathway units are competing amongst themselves for which stripe gets to Go, and someone always Go's on every trial. This is critical for avoiding the "all NoGo" problem that plagued V2, where none of the stripes fired a Go, and then no learning could take place, and a vicious cycle of bad performance ensued. Put another way: in V2, the fundamental competition was Go vs. NoGo within a stripe, whereas in V3 the fundamental competition is between all the Go's across all the stripes. In V2, the cross-stripe competition was all down in the SNr, whereas in V3 it is back up in the representational learning units of the Striatum, allowing competition to shape learning directly. This is a major advance with numerous computational benefits.
The basic maintenance-output gating dynamic in V3 with this Go-level competition operates as follows: when new relevant information comes in, MAINT gating wins the competition, suppressing output gating, and the information is maintained. When it is time to respond, then OUTPUT gating wins, suppressing maintenance gating, and the previously maintained information is output. This mutual-exclusivity between MAINT and OUTPUT was hard-coded into V2, but emerges much more naturally in V3.
The only downside of the "someone always gates" dynamic in V3 comes during a maintenance period with irrelevant information coming in: this means that the irrelevant information will be gated in some way or another. Thus, models will typically need more stripes to accommodate the need to process irrelevant information. This also makes important psychological predictions in terms of the effects of distracting information.
BG Gating Biases
The major innovation in V2 was the introduction of biases that encouraged BG gating policies that were generally sensible, right from the start. This leads to dramatic improvements in RL learning, by cutting down significantly on the exploration of non-productive regions of the high-dimensional problem space. In V3 these biases emerge much more naturally as a result of the reorganized competition and patterns of connectivity into the BG.
- MAINT and OUTPUT gating are mutually exclusive: this is encouraged by the mutual competition of all gating types within the common Go layer.
- Prefer empty stripes for gating: this is the one clear function of the NoGo pathway currently -- NoGo receives from corresponding deep PFC layer -- when this deep layer is maintaining, then there is a NoGo bias, which then provides some inhibition on corresponding Go pathway units. This is a weak bias, which can be controlled through the use of the nogo_inhib parameter on the MatrixLayerSpec, which determines how strongly the NoGo inhibits the Go. Learning works fine with nogo_inhib=0, but performance is best with a value of .2 (according to current limited tests).
- Favor OUTPUT gating when PVr detects PV (reward) likely on this trial: there is a projection from PVr "1" value unit to OUTPUT Go stripes, and from PVr ".5" value unit to MAINT Go stripes. This corresponds to mnt_rew_nogo, out_norew_nogo, and out_rew_go biases from V2. The mutual competition in BG naturally consolidates these different biases. Biologically, it is likely that this bias is mediated through projections from OFC to BG as well. The strength of this PVr bias can be determined by setting the wt_scale.rel parameter on this projection from the PVr.
The main job of the SNrThal is to provide the final KWTA competition to determine which stripes Go. Typically k is set to a small number like 2, and KWTA_AVG_INHIB is used with a relatively high inhib.kwta_pt (.8) to give flexibility in the actual number that fire, with strong competition.
Biologically, we think the ability of the SNr to mediate such a broad competition is due to in part to inputs from the STN which provide background excitation of the SNr units, and then the strongest Go pathway guys are able to break through this with strong inhibition.
Relative to V2, as noted above, the job of the SNrThal is now much simpler because much of the competition is taking place within the Matrix directly, including critically the influence of NoGo. Although there is a nogo_gain parameter in the snrthal, it is set to 0 by default, and tests show it works much better for this to all occur in Matrix.
There remains some question as to the scope of the SNrThal competition in larger-scale models with multiple matrisomes -- is there competition across matrisomes or not?
One of the major advantages of the SRN segregation comes in the clarity of the signal that the PFC deep layers provide to the LV layer in PVLV. The role of PVLV in the PBWM model remains as before: the LV layer learns to associate maintained PFC activation states with primary reward values (at the time when such rewards are applied). Then, when the BG gating drives PFC maintenance on subsequent trials, the LV evaluates that gating action (after the gating has taken place -- it is always trial-and-error) in terms of the reward value of the PFC activation state that is produced. In V3, LV receives only from the PFCd_mnt layer, and this provides an unambiguous representation of what is actually being maintained. By contrast, in V2, the single PFC layer that projected to LV would send undifferentiated ungated and gated activity to LV, which then would have to somehow sort out which was what.
An important topic for future investigation is whether this new clearer signal could alter the PVLV temporal derivative dynamics in any useful way, or any of the other dynamics in PVLV relative to training PBWM.
Critical Dynamics for Parameterizing a New Model
Start with Clear Maint, but should not be necc with well-balanced model
For both the loop and SIR models, it was necessary to first clear out the maintenance layers at obvious boundaries, to get the model properly debugged. But when it was running smoothly, it was then possible to remove this "crutch" and performance was hardly affected at all. So if you're having difficulty with the model at the start, try this approach.
The default da_gain value for Matrix learning is .1 -- this value can be optimized for different models -- the SIR model does best with da_gain = .5, and loop is better at .1. The general intuition is that the harder the task, and thus the greater amount of negative feedback over a more extended period of time, the lower the da_gain value needs to be, to prevent the weights from just getting "hammered", to the point of losing the initial biases completely, for example.
Balancing Relative Net Inputs
Typically the PFCs_mnt layer is always active in a model, whereas the PFCs_out layer activity depends on successful gating, and is often sparser than the mnt layer. Thus, it is often useful to adjust the connections from these PFC layers to the Matrix_Go layer, to enable the output gating to have priority when relevant information is available to be output gated. It is likely that output gating is prepotent in this way.
Also, it is important for the PFC layers to have a reasonable balance of inputs from various sources, so that the kwta dynamics etc operate in a sensible way. See Leabra Netin Scaling for important general information about this netin balancing process.
The following are useful guidelines obtained across multiple models for the relative netinput proportions for different projections into each of the main PFCs layers and the Matrix_Go layer. There are various conspecs targeted at these different projections (as shown), and one can adjust the wt_scale.rel parameters to achieve these relative balances. Turn on the compute_rel_netin flags at the Trial and Epoch levels in your programs, and add the relevant monitors to your Epoch monitor (see the default LeabraEpochMonitorPBWM which has this already built in -- available in the standard program library).
- PFCs_mnt -- From:
- PFCd_mnt = .3 -- ToPFCSelf
- Input = .5 -- ToPFCTopo
- Output = .2 -- ToPFCFmOutput
- PFCs_out -- From:
- PFCd_out = .1 -- ToPFCSelf
- PFCd_mnt = .7 -- ToPFCTopo
- Output = .2 -- ToPFCFmOutput
- Matrix_Go -- From:
- PVr = .3 -- MatrixFmPvr
- PFCs_mnt = .02 -- MatrixConsTopoWeak -- Note: probably doesn't need to be quite this weak..
- PFCs_out = .3 -- MatrixConsTopo -- Note: may need to use MatrixConsTopoStrong to get stronger
- Input = .4 -- MatrixConsTopo