|Author||Randall C. O'Reilly|
|First Published||Nov 5 2016|
|Tags||Prefrontal Cortex, Basal Ganglia, Working Memory, Gating, Active Maintenance, Reinforcement Learning, Dopamine|
|Description||Illustrates the dynamic gating of information into PFC active maintenance, by the basal ganglia (BG). It uses a simple Store-Ignore-Recall (SIR) task, where the BG system learns via phasic dopamine signals and trial-and-error exploration, discovering what needs to be stored, ignored, and recalled as a function of reinforcement of correct behavior.|
|Updated||5 November 2016, 14 January 2017, 8 September 2017, 17 January 2018|
|Versions||8.0.0, 8.0.2, 8.0.3, 8.0.4, 8.0.6, 8.0.7|
|Emergent Versions||8.0.1, 8.0.4, 8.2.0, 8.5.1|
|Other Files||File:sir trained.wts.gz|
This simulation illustrates the dynamic gating of information into PFC active maintenance, by the basal ganglia (BG). It uses a simple Store-Ignore-Recall (SIR) task, where the BG system learns via phasic dopamine signals and trial-and-error exploration, discovering what needs to be stored, ignored, and recalled as a function of reinforcement of correct behavior, and learned reinforcement of useful working memory representations..
The SIR task basically requires the network to Recall (when the R unit is active) the letter (A-D) that was present when a Store (S) input was active earlier. Ignore (I) trials also have a letter input, but, as you might guess, these are to be ignored. Trials are randomly generated, and there can be a random number of Ignore trials between a Store and Recall trial, so the model must learn to maintain the stored information in robust working memory representations in the PFC, until the next Recall trial, with variable numbers of intervening and unpredictable distractors between task-relevant events.
Upon opening project as instructed above, you will notice that the network is configured with the input and output information at the top of the network instead of the usual convention of having the input at the bottom -- this is because all of the basal ganglia mechanisms associated with the gating system are located in an anatomically appropriate "subcortical" location below the cortical layers associated with the rest of the model.
The main processing of information in the model follows the usual path from Input to Hidden to Output. However, to make appropriate responses based on the information that came on earlier trials, the Hidden layer needs access to the information maintained in the PFC (prefrontal cortex) layer. The PFC will maintain information in an active state until it receives a gating signal from the basal ganglia gating system, at which point it will update to encode (and subsequently maintain) information from the current trial. In this simple model, the PFC acts just like a copy of the sensory input information, by virtue of having direct one-to-one projections from the Input layer. This makes it easy to see directly what the PFC is maintaining -- the model also functions very well if the PFC representations are distributed and learned, as is required for more complex tasks. Although only one PFC "stripe" is theoretically needed for this specific task (but see the end of this documentation for link to more challenging tasks), the system works much better by having a competition between multiple stripes, each of which attempts to learn a different gating strategy, searching the space of possible solutions in parallel instead of only serially -- hence, this model has four PFC maintenance stripes that each can encode the full set of inputs. Each such stripe corresponds to a hypercolumn in the PFC biology.
Within each hypercolumn/stripe, we simulate the differential contributions of the superficial cortical layers (2 and 3) versus the deep layers (5 and 6) -- the superficial are labeled as PFCmnt and the deep as PFCmnt_deep in the model. The superficial layers receive broad cortical inputs from sensory areas (i.e., Input in the model) and from the deep layers within their own hypercolumn, while the deep layers have more localized connectivity (just receiving from the corresponding superficial layers in the model). Furthermore, the deep layers participate in thalamocortical loops, and have other properties that enable them to more robustly maintain information through active firing over time. Therefore, these deep layers are the primary locus of robust active maintenance in the model, while the superficial layers reflect more of a balance between other (e.g., sensory) cortical inputs and the robust maintenance activation from the deep layers. The deep layers also ultimately project to subcortical outputs, and other cortical areas, so we drive the output of the model through these deep layers into the Hidden layer.
As discussed in the Executive Chapter, electrophysiological recordings of PFC neurons typically show three broad categories of neural responses (see Figure 10.3 in chapter 10, from Sommer & Wurtz (2000)): neurons that respond phasically to sensory inputs; other neurons that respond with a sustained active maintenance; and a set that respond at the time when a motor response or other kind of cognitive action is required. The PFCmnt neurons can capture the first two categories -- it is possible to configure the PFCmnt units to have different temporal patterns of responses to inputs, including phasic, ramping, and sustained. However, the third category of neurons require a separate BG-gating action to drive an appropriate (and appropriately timed) motor action, and thus we have a separate population of output gating stripes in the model, called PFCout (superficial) and PFCout_deep (deep). It is these PFCout_deep neurons that project to the posterior cortical Hidden and Output layers of the model, and drive overt responding. For simplicity, we have configured a topographic one-to-one mapping between corresponding PFCmnt and PFCout stripes -- so the model must learn to gate the appropriate PFCout stripe that corresponds to the PFCmnt stripe containing the information relevant to driving the correct response.
In summary, correct performance of the task in this model requires BG gating of Store information into one of the PFCmnt stripes, and then not gating any further Ignore information into that same stripe, and finally appropriate gating of the corresponding PFCout stripe on the Recall trial. This sequence of gating actions must be learned strictly through trial-and-error exploration, shaped by dopamine-based reinforcement learning based on the PVLV dopamine system located on the left-bottom area of the model (see the PVLV Model for details. The key point is that the PVLV system can learn the predicted reward value of cortical states and use errors in predictions to trigger dopamine bursts and dips that train striatal gating policies).
To review the functions of the other layers in the PBWM framework (see PBWM for details):
- Matrix: this is the dynamic gating system representing the matrix units within the dorsal striatum of the basal ganglia. The bottom layer contains the "Go" (direct pathway) units, while top layer contains "NoGo" (indirect pathway). As in the earlier BG Model, the Go units, expressing more D1 receptors, increase their weights from dopamine bursts, and decrease weights from dopamine dips, and vice-versa for the NoGo units with more D2 receptors. As is more consistent with the BG biology than earlier versions of this model, most of the competition to select the final gating action happens in the GPe and GPi (with the hyperdirect pathway to the subthalamic nucleus also playing a critical role, but not included in this more abstracted model), with only a relatively weak level of competition within the Matrix layers. Note that we have combined the maintenance and output gating stripes all in the same Matrix layer -- this allows these stripes to all compete with each other here, and more importantly in the subsequent GPi and GPe stripes -- this competitive interaction is critical for allowing the system to learn to properly coordinate maintenance when it is appropriate to update/store new information for maintenance vs. when it is important to select from currently stored representations via output gating.
- GPeNoGo: provides a first round of competition between all the NoGo stripes, which critically prevents the model from driving NoGo to all of the stripes at once. Indeed, there is physiological and anatomical evidence for NoGo unit collateral inhibition onto other NoGo units. Without this NoGo-level competition, models frequently ended up in a state where all stripes were inhibited by NoGo, and when nothing happens, nothing can be learned, so the model essentially fails at that point!
- GpiThal: Has a strong competition for selecting which stripe gets to gate, based on projections from the MatrixGo units, and the NoGo influence from GPeNoGo, which can effectively veto a few of the possible stripes to prevent gating. As discussed in the BG Model, here we have combined the functions of the GPi (and SNr) and the Thalamus into a single abstracted layer, which has the excitatory kinds of outputs that we would expect from the thalamus, but also implements the stripe-level competition mediated by the GPi/SNr. If there is more overall Go than NoGo activity, then the GPiThal unit gets activated, which then effectively establishes an excitatory loop through the corresponding deep layers of the PFC, with which the thalamus neurons are bidirectionally interconnected.
- ExtRew, RewTarg, PosPV: The PosPV layer provides positive primary value input to the PVLV system, and in this model we drive it from the ExtRew layer which looks at the Output layer and, in conjunction with the RewTarg input, evaluates whether the network made the correct response, and drives either a 0 (error, no reward) or 1 (correct, reward) activation, as labeled. The RewTarg input tells the ExtRrew system on which trials to evaluate the Output layer activations -- specifically on trials in which Reward is available (here, Recall trials). While the PosPV layer corresponds to the lateral hypothalamus for food or liquid rewards, these other layers are just "hacks" to drive this PosPV input so that it is contingent upon network behavior -- in other models we often just use Program code to achieve this function.
- VTAp: this is the final dopamine unit activation from the PVLV model, reflecting reward prediction errors. When outcomes are better (worse) than expected or states are predictive of reward (no reward), these units will increase (decrease) activity. For convenience, tonic (baseline) states are represented here with zero values, so that phasic deviations above and below this value are observable as positive or negative activations. (In the real system negative activations are not possible, but negative prediction errors are observed as a pause in dopamine unit activity, such that firing rate drops from baseline tonic levels). Biologically the SNc actually projects dopamine to the dorsal striatum, while the VTA projects to the ventral striatum, but there is no functional difference in this level of model.
- VSPatchPosD1: This represents the pathway through the ventral striatum in PVLV that learns to expect PosPV primary value rewards, and then drives shunting of the VTA phasic dopamine response for expected rewards, and, via the LHbRMTg lateral habenular pathway, dips in VTA activation when expected rewards are not received.
- In this model, Matrix learning is driven exclusively by dopamine firing at the time of rewards (i.e., on Recall trials), and it uses a synaptic-tag-based trace mechanism to reinforce/punish all prior gating actions that led up to this dopaminergic outcome. Specifically, when a given Matrix unit fires for a gated action (we assume it receives the final gating output from the GPi / Thalamus either via thalamic or PFC projections -- this is critical for proper credit assignment in learning), we hypothesize that structural changes in the synapses that received concurrent excitatory input from cortex establish a synaptic tag. Extensive research has shown that these synaptic tags, based on actin fiber networks in the synapse, can persist for up to 90 minutes, and when a subsequent strong learning event occurs, the tagged synapses are also strongly potentiated (Redondo & Morris, 2011; Rudy, 2015; Bosch & Hayashi, 2012). This form of trace-based learning is very effective computationally, because it does not require any other mechanisms to enable learning about the reward implications of earlier gating events. (In earlier versions of the PBWM model, we relied on CS (conditioned stimulus) based phasic dopamine to reinforce gating, but this scheme requires that the PFC maintained activations function as a kind of internal CS signal, and that the amygdala learn to decode these PFC activation states to determine if a useful item had been gated into memory. Compared to the trace-based mechanism, this CS-dopamine approach is much more complex and error-prone. Nevertheless, there is nothing in the current model that prevents it from also contributing to learning. However, in the present version of the model, we have not focused on getting this CS-based dopamine signal working properly -- there are a couple of critical issues that we are addressing in newer versions of the PVLV model that should allow it to function better.)
| ⇒ To explore the model's connectivity, click on r.wt and on various units within the layers of the network. |
SIR Task Learning
Now, let's step through some trials to see how the task works.
| ⇒ Switch back to viewing activations (act). Do , in the . |
The task commands (Store, Ignore, Recall) are chosen completely at random (subject to the constraint that you can't store until after a recall, and you can't recall until after a store) so you could get either an ignore or a store input. You should see either the S or I task control input, plus one of the stimuli (A-D) chosen at random. The target output response should also be active, as we're looking at the plus phase information (stepping by trials).
Notice that if the corresponding GPiThal unit is active, the PFC stripe will have just been updated to maintain this current input information.
| ⇒ Hit again. |
You should now see a new input pattern. The GPiThal gating signal triggers the associated PFC stripe to update its representations to reflect this new input. But if the GPiThal unit is not active (due to more overall NoGo activity), PFC will maintain its previously stored information. Often one stripe will update while the other one doesn't; the model has to learn how to manage its updating so that it can translate the PFC representations into appropriate responses during recall trials.
| ⇒ Keep hitting and noticing the pattern of updating and maintenance of information in PFCmnt, and output gating in PFCout, and how this is driven by the activation of the GPiThal unit (which in turn is driven by the Matrix Go vs. NoGo units, which in turn are being modulated by dopamine from the PVLV system to learn how to better control maintenance in the PFC!). |
When you see a R (recall) trial, look at the VTAp (dopamine) unit at the bottom layer. If the network is somehow able to correctly recall (or guess!), then this unit will have a positive (yellow) activation, indicating a better-than expected performance. Most likely, it instead will be teal blue and inverted, indicating a negative dopamine signal from worse-than expected performance (producing the wrong response). This is the reinforcement training signal that controls the learning of the Matrix units, so that they can learn when information in PFC is predictive of reward (in which case that information should be updated in future trials), or whether having some information in PFC is not rewarding (in which case that information should not be updated and stored in future trials). It is the same learning mechanism that has been extensively investigated (and validated empirically) as a fundamental rule for learning to select actions in corticostriatal circuits, applied here to working memory.
| ⇒ You can continue to tab. and observe the dynamics of the network. When your mind is sufficiently boggled by the complexity of this model, then go ahead and hit , and switch to the |
You will see two different values being plotted as the network learns:
- cnt_err (black line): shows the overall number of errors per epoch (one epoch is 100 trials in this case), which quickly drops to around 10-15, which is basically the number of recall trials (the others are quickly learned as they do not require any active maintenance).
- R_da (green line): shows dopamine for Recall trials (when the network's recall performance is directly rewarded or punished). As you can see, this value tends to converge to a mean of zero, because R_da reflects the difference from expectation, and the system quickly adapts its expectations based on how it is actually doing. The main signals to notice here are when the network suddenly starts doing better than on the previous epoch (cnt_err drops) -- this should be associated with a peak in R_da, whereas a sudden increase in errors (worse performance) results in a dip in R_da. As noted above, these R_da signals are training up the Matrix gating actions since the last Recall trial.
The network can take roughly 5-50 epochs or so to train (it will stop when cnt_err gets to 0 5 times in a row).
| ⇒ Once it has trained to this criterion, you can switch back to viewing the network, and Step through trials to see that it is indeed performing correctly. Pay particular attention to the GPiThal activation and what the PFC is maintaining and outputting as a result -- you should see Go firing on Store trials for one of the stripes, and NoGo on Ignore trials for that same stripe. The other PFCmnt stripe may gate for Ignore trials -- it can afford to do so given the capacity of this network relative to the number of items that needs to be stored -- but typically the model will not do output gating in PFCout for these. |
|Question 10.7: Report the patterns of R_da dopamine firing in relation to the cnt_err performance of the model, and explain how this makes sense in terms of how the network learns.|
Now we will explore how the Matrix gating is driven in terms of learned synaptic weights. Note that we have split out the SIR control inputs into a separate CtrlInput layer that projects to the Matrix layers -- this control information is all that the Matrix layer requires. It can also learn with the irrelevant A-D inputs, but just takes a bit longer.
| ⇒ Click on s.wt in the panel, and then click on the individual SIR units in the CtrlInput layer to show the learned sending weights from these units to the Matrix. |
|Question 10.8: Explain how these weights from S,I,R inputs to the various Matrix stripes makes sense in terms of how the network actually solved the task, including where the Store information was maintained, and where it was output, and why the Ignore trials did not disturb the stored information.|
Note that for this simple task, the number of items that needs to be maintained at any one time is just one, which is why the network still gates Ignore trials (it just learns not to output gate them). If you're feeling curious you can use the Wizard in the software to change the number of PBWM stripes to 1, and there you should see that the model can still learn this task but is now pressured to do so by ignoring I trials at the level of input gating. However, by taking away the parallel learning abilities of the model, it can take longer to learn.
If you want to experience the full power of the PBWM learning framework, you can check out the sir52 v50 model, which takes the SIR task to the next level with two independent streams of maintained information. Here, the network has to store and maintain multiple items and selectively recall each of them depending on other cues, which is very demanding task that networks without selective gating capabilities cannot achieve. This version more strongly stresses the selective maintenance gating aspect of the model (and indeed this problem motivated the need for a BG in the first place).
| ⇒ You may now close the project (use the window manager close button on the project window or menu item) and then open a new one, or just quit emergent entirely by doing menu option or clicking the close button on the root window. |