CECN1 PFC Maint Updt

From Computational Cognitive Neuroscience Wiki

Jump to: navigation, search
  • The project file: pfc_maint_updt.proj (click and Save As to download, then open in Emergent -- NOTE: requires version 4.13 or higher)

Back to CECN1 Projects

Project Documentation

(note: this is a literal copy from the simulation documentation -- it contains links that will not work within the wiki)

IMPORTANT: This model is significantly different in its implementational details (different gating mechanism) from the original one described in the textbook. The question at the end is also different, but getting at the same overall issues.

  • GENERAL USAGE NOTE: To start, it is usually a good idea to do Object/Edit Dialog in the menu just above this text, which will open this documentation in a separate window that you can more easily come back to. Alternatively, you can just always return to this document by clicking on the ProjectDocs tab at the top of the middle panel.

Important: this model is changed signficantly from the one in the textbook, using a much more powerful version of the dynamic gating mechanism modulated by dopamine, based on the model described in [& Frank, 2006] -- the task is generally the same however, and the key conceptual points are likewise the same.

The network is configured with the input and output information at the top of the network instead of the usual convention of having the input at the bottom -- this is because all of the basal ganglia mechanisms associated with the gating system are located in an anatomically appropriate location below the cortical layers associated with the rest of the model.

The main processing of information in the model follows the usual path from Input to Hidden to Output. However, to make appropriate responses based on the information that came on earlier trials, the Hidden layer needs access to the information maintained in the PFC (prefrontal cortex) layer. The PFC will maintain information in an active state until it receives a gating signal from the basal ganglia gating system, at which point it will update to encode (and subsequently maintain) information from the current trial. In this simple model, the PFC acts just like a copy of the sensory input information, by virtue of having direct one-to-one projections from the Input layer. This makes it easy to see directly what the PFC is maintaining -- the model also functions very well if the PFC representations are distributed and learned, as is required for more complex tasks.

Now for a brief overview of the basal ganglia gating system (for complete details about these layers, see the [and Frank, 2006] paper (O'Reilly, R.C. & Frank, M.J. (2006). Making Working Memory Work: A Computational Model of Learning in the Frontal Cortex and Basal Ganglia. Neural Computation, 18, 283-328.)

  • Matrix: this is the dynamic gating system representing the matrix units within the dorsal striatum of the basal ganglia. Every even-index unit within a stripe represents "Go", while the odd-index units represent "NoGo." If overall more Go units fire, this will cause updating of the PFC, but if more NoGo units fire, this will prevent updating and cause the PFC to maintain its existing memory representation.
  • SNrThal: represents the output of the basal ganglia system, abstracting across substantia nigra pars reticulata (SNr), globus pallidus, and Thalamus, which implement the gating signal contingent on relative Go-NoGo activity in the Matrix. If there is more overall Go activity, then the SNrThal unit gets activated, providing bottom-up excitation and driving updating in PFC.
  • PV* and LV* and friends at the very bottom layer of the network: these represent the dopaminergic system, which provides reinforcement learning signals to train up the dynamic gating system in the basal ganglia. The PV layers represent primary values of reward (i.e., actual externally-delivered reward values), while the LV layers represent learned ("anticipated") values -- together, they account for Pavlovian conditioning phenomena and associated dopaminergic firing data. They represent an alternative to the TD reinforcement learning model described in Chapter 6.
  • To explore the model's connectivity, click on r.wt and on various units within the layers of the network.

Now, let's step through some trials to see how the task works.

  • Switch back to viewing activations (act). Do Init, Step in the control panel. Then press F5 or do View/Refresh on the main menu, which will display the names of the units.

Unlike the model described in the textbook, the task commands (Store, Ignore, Recall) are chosen completely at random (subject to the constraint that you can't store until after a recall, and you can't recall until after a store) so you could get either an ignore or a store input. You should see either the S or I task control input, plus one of the stimuli (A-D) chosen at random. The target output response should also be active, as we're looking at the plus phase information (stepping by trials).

Notice that if the SNrThal unit is active, the PFC layer has just been updated to maintain this current input information.

  • Hit Step again.

You should now see a new input pattern. If the SNrThal gating signal is active again, then the PFC will again update its representations to reflect this new input. But if the SNrThal unit is not active (due to more overall NoGo activity), PFC will maintain its previously stored information.

  • Keep hitting Step and noticing the pattern of updating and maintenance of information in PFC, and how this is driven by the activation of the SNrThal unit (which in turn is driven by the Matrix Go vs. NoGo units, which in turn are being modulated by dopamine from the PVLV system to learn how to better control maintenance in the PFC!).

When you see a R (recall) trial, look at the DA unit in the back of the bottom layer. If the network is somehow able to correctly recall (or guess!), then this unit will have a positive (yellow) activation, indicating a better-than expected performance. Most likely, it instead will be teal blue and inverted, indicating a negative dopamine signal from worse-than expected performance (producing the wrong response). This is the reinforcement training signal that controls the learning of the Matrix units, so that they can learn when information in PFC is predictive of reward (in which case that information should be updated in future trials), or whether having some information in PFC is not rewarding (in which case that information should not be updated and stored in future trials).

  • You can continue to Step and observe the dynamics of the network. When your mind is sufficiently boggled by the complexity of this model, then go ahead and hit Run, and switch to the .T3Tab.EpochOutputData tab.

You will see various different values being plotted as the network learns:

  • cnt_err (black line): shows the overall number of errors per epoch (one epoch is 100 trials in this case), which quickly drops to around 10-15, which is basically the number of recall trials (the others are quickly learned as they do not require any active maintenance).
  • S_da (red line): shows the amount of dopamine delivered on Store trials, on average. This should decrease at the start (the PVLV system has a novelty bias that provides early initial dopamine) but then start to increase as the network starts to get the recall trials correct, tracking the cnt_err performance (i.e., when error goes down, S_da goes up, indicating correct store trials that were reinforced). This dopamine signal delivered to the Store trials is the key for allowing the Matrix units to learn that storing information leads to subsequent rewards. It results from the PVLV system recognizing that, if the basal ganglia system does fire Go and update the Store information into the PFC, this pattern of activation in PFC has been associated with reward in the past, and thus some dopamine should be delivered to reinforce the updating of that store information.
  • I_da (blue line): shows dopamine for Ignore trials, which just delines and stays around zero, because the network does not get any reliable reward associated with encoding ignore information into the PFC.
  • R_da (green line): shows dopamine for Recall trials (when the network's recall performance is directly rewarded or punished). As you can see, this value always tends to converge to a mean of zero, but with very large fluctuations on either side. This is because it reflects the difference from expectation, and the system quickly adapts its expectations based on how it is actually doing. Thus, the main signals to notice here are when the network suddenly starts doing better than on the previous epoch (cnt_err drops) -- this should be associated with a peak in R_da, whereas a sudden increase in errors (worse performance) results in a dip in R_da.

The network can take roughly 30-100 epochs or so to train (it will stop when cnt_err gets to 0).

  • Once it has trained to this criterion, you can switch back to viewing the network, and Step through trials to see that it is indeed performing correctly. Pay particular attention to the SNrThal activation and what the PFC is maintaining as a result.

NOTE: the following question is different than the one in the textbook. Your instructor may have you skip this question (or not) -- be sure to check.


Question 9.12 (a) Report the relativel levels of DA unit firing to the S, I and R trials at the end of learning. (b) Explain why these differences arise, and how they contribute to helping the network solve the task. Now, turn the network display back on in the SIRnet tab, and step through a trial using the Control Panel. (c) Report whether there is more Go or NoGo unit activity in the Striatum for a couple of different Store and Ignore trials. You should see that the SNrThal (BG gating output) unit is active only when there is more Go than NoGo activity. Explain how the differences in Go/NoGo activity patterns arise.


Now we will see that the Matrix (striatum), which controls updating, and the PVLV reinforcement learning system, contribute differently to solving the overall problem.

Click on s.wt in the SIRnet panel, and then click on individual S and I units in the Input layer to show the learned sending weights from these units to other layers in the network. You should see greater weights from the Input Store unit to Matrix Go than to NoGo units, confirming that the Matrix learned Go to update the Store stimulus. The opposite should be true for the Ignore stimulus (you might have to change the color scale displaying the weight values in the SIRNet tab for this to be obvious). Next, click on the S and I units in the PFC layer, and you should see similarly, that in this case the difference may be less clear. This is because it is useful to update store signals when they are in the Input, but the network should not necessarily update when the Store signals are maintained in PFC (e.g., if the Input pattern is currently Ignore). Thus in this simple WM task, the Input pattern is most relevant for dictating when the Matrix should update or not.

Next, we examine whether this distinction between Input and PFC weights applies to the evaluation of reward value by the reinforcement learning system. Click again on the S and I units in the Input and PFC layers, while still viewing s.wt, and observe the weights into the LVe layer at the bottom right of the network. This layer corresponds to a part of the amygdala, and reflects the learned value (LV) of reward that the reinforcement learning system attributes to its inputs (note that the RL system can evaluate reward values of both external inputs from the environment, but can also evaluate the value of its PFC internal states). Weights to the units in the LVe layer labeled "1" reflect that this layer assigns a high reward value to the sending unit. Similarly, stronger weights toe the units labeled "0" reflect that this layer assings a low reward value to the sending unit.


Question 9.12 cont'd: (d) Report the sending (s.wt) weights from the S and I units in both the Input and PFC layers to the LVe layer. Are there qualitative differences between the learned reward value in the LVe layer of the Store input unit compared to that of the Store PFC unit? How about the Ignore weights from Input and PFC? Based on the lecture notes and what you know about how this system learns, why do you think this might be?


  • When you are done with this simulation, you can either close this project in preparation for loading the next project, or you can quit completely from the simulator.
Personal tools