CCNBook/Executive/PBWM

From Computational Cognitive Neuroscience Wiki
Jump to: navigation, search

Back to Executive Function Chapter.

This page provides additional details about the PBWM-specific components of a standard PBWM model. IMPORTANT: this is not fully updated to the most current version of PBWM. See the SIR Model for a description of the latest version. Information here is likely relevant but may not be fully applicable.

Output gating

Output gating can be highly beneficial in selecting which maintained information is relevant at a given time for controlling behavior, but the restriction on access to maintained information can also be detrimental in some situations. In some cases, it is useful to have a sustained top-down bias or contextualization of behavior based on maintained information, that persists over multiple trials. Also, the requirement that output gating be functional before maintained information can be accessed introduces an additional hoop that learning must jump through before correct performance, and thus rewards, can be obtained. For these reasons, we typically make connections to posterior cortical areas from both the PFC_mnt (maintenance PFC stripes) and PFC_out (output gated PFC stripes), and allow the natural learning processes to adapt the synaptic weights from each of these areas, to sort out which information is the most useful. Over the course of learning, as output gating improves, the system will naturally come to rely more on the cleaner PFC_out signals. But areas that require consistent top-down biasing may retain a significant influence from the PFC_mnt stripes.

Clearing of Maintained Information after Output Gating

A fundamental puzzle for the PFC active maintenance model is: how does the system know when to clear out information that is no longer needed? There is no obvious signal indicating when a given piece of information is no longer relevant, so it is difficult to learn when to clear out previously maintained information. The PBWM system only knows that some information is associated with positive reward, and other information is not, but it doesn't know when something that was useful is no longer so. There are various possible solutions to this problem, but the one we adopt by default is that output gating automatically clears the corresponding PFC_mnt stripe. This is generally sensible because it is generally true that once you use a given piece of maintained information, it no longer needs to be maintained. Of course, this is not always true, but the system can also learn to re-store this information after it output gates it, if it turns out that it is useful again later. In most cases we have examined, the benefits of clearing out maintained information turn out to be greater than the costs of this "use it and lose it" policy. The most obvious benefits come from the gating biases that we discuss in the next section.

At a biological level, the clearing of PFC maintenance is evident in most neural recording studies (e.g., note in ??? that the delay neurons turn off just after the saccade). The most likely neural mechanism that drives this clearing is a wave of corticocortical excitation from the output gating PFC neurons back onto the maintenance PFC neurons, which causes them to fire synchronously, and also triggers a large wave of GABA-mediated inhibition, including more long-lasting GABA-B inhibition (Hazy et al, in prep; Rigas,Castro, Alamancos, 2007, Gutkin, Laing Colby et al, 2001). Somewhat counterintuitively, synchronous neural firing can interrupt the reverberant excitatory loops because if all the neurons fire at once and then go refractory, there is nobody left to carry the torch!

Other possible mechanisms that could result in clearing include the following:

  • Higher-level PFC areas that play a role in controlling the maintenance and updating of information in lower-level PFC areas could potentially provide relevant clearing signals -- if there is a "master plan" for an overall sequence of PFC maintenance and updating, this would contain the knowledge relevant to when information was no longer relevant.
  • Fixed upper limits on maintenance, or active decay -- these are less flexible than an active, dynamically gated mechanism, and would impose strong a-priori constraints on how long information could be maintained. The electrophysiological data strongly suggests a more active mechanism that terminates maintenance at behaviorally relevant points, rather than some fixed maintenance interval.

Gating Biases

Because the PBWM system depends so critically on performing correctly before it can then learn to reinforce this correct behavior, introducing strategic biases in the maintenance and output gating behavior can significantly improve learning performance. These biases depend on a few key status variables, such as whether a primary reward is expected to occur on the current trial or not (which is computed directly by the PVr layer in the PVLV system), and whether a given stripe is maintaining information currently or not (hence the importance of the clearing mechanism described above). The biases are:

  • out_rew_go: If a stripe is currently maintaining something, and a primary reward signal is expected on the current trial (i.e., this is a time when some kind of response is generally required, which will be evaluated for reward), then there is a bias for the output gating to fire Go. This encourages output gating of maintained information, and helps the output gating system bootstrap itself. It is a relatively weak bias, so it can easily be overruled by subsequent learning.
  • out_empty_nogo: This is the converse of out_rew_go -- if a stripe is empty (not maintaining) then bias NoGo, and not Go. This is a relatively strong bias (no real case why output gating would ever want to occur in the absence of maintenance).
  • out_norew_nogo: This is another converse of out_rew_go -- if no primary reward signal is expected (norew), then bias NoGo instead of Go for output gating. This just helps the system fire output gating at the right time. It is also a weak bias that can be overcome through learning.
  • mnt_rew_nogo: Baises the maintenance gating system to fire NoGo when a primary reward signal is expected (rew) -- in general it is not useful to update new information when a behavioral response is required -- output gating should be operating at that time instead.

Further Details

Here are some other more technical details about the PBWM system that can be useful in understanding how it functions:

  • The updating of PFC representations can interact with the phase-based Leabra learning mechanisms, so the timing of this updating is important. Specifically, all of the updating of PFC representations takes place during the first half of the minus phase of activation settling, based on the BG gating signals that are activated during that time window (the two halves of the minus phase are indicated by the two - - symbols followed by the + plus phase in ???). The repercussions of these PFC updates can then play out in the second half of the minus phase, so that by the time the plus phase comes around, any changes that happen then are due to actual error signals (outcomes different than expectations; see the Learning Chapter).
  • kWTA inhibitory competition in the SNrThal layer limits the number of Go gating decisions to just a fraction of the total number of stripes, with only the strongest net Go stripes winning out. This competitive funneling effect captures the important constraint inherent in the BG gating system of doing only a few things at a time so as to avoid mutual interference. This advantage may be best appreciated by thinking about the motor system where it is important not to try to walk forward and backward at the same time. In any event, this funneling effect is well documented in the empirical literature, particularly with regard to the motor basal ganglia.
  • There are stripe-specific versions of the LV system, associated with the striosomes or patch regions of the striatum, which project directly to the SNc in a topographic fashion, and thus can drive stripe-specific dopamine signals. The Patch and SNc layers in the model capture this dynamic.

TODO: add equations etc