CECN1 Wt Priming

From Computational Cognitive Neuroscience Wiki

Jump to: navigation, search

Contents

Weight Based (Long Term) Priming

  • The project file: wt_priming.proj (click and Save As to download, then open in Emergent -- NOTE: requires version 4.13 or higher)

Back to CECN1 Projects

Project Documentation

(note: this is a literal copy from the simulation documentation -- it contains links that will not work within the wiki)

IMPORTANT: This model is significantly different in its implementational details (names of stats, way that priming is tested) from the original one described in the textbook. The question at the end is also different, having only 2 parts, but getting at the same overall issues.

  • GENERAL USAGE NOTE: To start, it is usually a good idea to do Object/Edit Dialog in the menu just above this text, which will open this documentation in a separate window that you can more easily come back to. Alternatively, you can just always return to this document by clicking on the ProjectDocs tab at the top of the middle panel.

This exploration starts with section 9.2 in the text, and demonstrates how small weight changes can produce significant behavioral priming, causing the network to favor one output pattern over another.

Notice that the network has a standard three layer structure, with the input presented to the bottom and output produced at the top (see also figure 9.1 in text).

You will see an grid view of an environment data table with 6 events shown (also see figure 9.2 in text). For each named event, the left-most pattern represents the input and the right represents the output. As you should be able to tell, the first set of 3 events and the second set of 3 events have the same set of 3 input patterns, but different output patterns. The names of the events reflect this, so that the first event has input pattern number '0', with the first corresponding output pattern (labeled 'a'), so it is named 0_a. The fourth event has this same input pattern, but the second corresponding output pattern (labeled 'b'), so it is named 0_b. The environment actually contains a total of 13 different input patterns, for a total of 26 input-output combinations (events).

Network Training

Let's train the network and see what happens. First, we will use the standard combination of Hebbian and error-driven learning as developed in chapters 4-6.

  • Press Train: Init, Run in the master .PanelTab.ControlPanel to start training -- you can see the network being trained on the patterns. Click on the .T3Tab.EpochOutputData tab in the far right frame to display a graph of training progress.

The graph shows two statistics of training. As with the Reber grammar network from chapter 6 (which also had two possible outputs per input), we cannot simply use a standard summed squared error (SSE) measure -- after all, which target would you use when the network can validly produce either one? Instead, we use a closest event statistic to find which event among those in the training environment has the closest (most similar) target output pattern to the output pattern the network actually produced (i.e., in the minus phase). This statistic gives us four results, only two of which is useful for this training phase, but the others will be useful for testing so we describe them all here: (a) the distance (min_dist) from the closest event (thresholded by the usual .5), which will be 0 if the output exactly matches one of the events in the environment; (b) the name of the closest event closest_name, which does not appear in the graph because it is not a numerical value, but it will appear on our testing data table used later; (c) a name_err measure that is 0 if this closest event has the same name as that of the event currently being presented to the network (i.e., it is the "correct" output), and 1 otherwise (think of it as a binary distance, or error, measure computed on the names); and (d) a both_err value that is like name_err except it is computed with respect to both possible output patterns (i.e., if the current output is closest to either of the two possible output patterns, it counts as a 0, and otherwise a 1).

The sum of the closest event distances over the epoch of training is plotted in black in the graph (sum_min_dist), and the sum of the both_err values in red (sum_both_err). As the network starts producing outputs that are closest to the correct output patterns, sum_both_err will go down, and sum_min_dist will go down as these outputs exactly match valid outputs in the environment.

As something of an aside, it should be noted that the ability to learn this one-to-many mapping task depends critically on the presence of the kWTA inhibition in the network -- standard backpropagation networks will only learn to produce a blend of both output patterns instead of learning to produce one output or the other (cf. Movellan & McClelland, 1993). Inhibition helps by forcing the network to choose one output or the other, because both cannot be active at the same time under the inhibitory constraints. We have also facilitated this one-to-many mapping by adding in a small amount of noise to the membrane potentials of the units during processing, which provides some randomness in the selection of which output to produce. Finally, Hebbian learning also appears to be important here, because the network learns the task better with Hebbian learning than in a purely error driven manner. Hebbian learning can help to produce more distinctive representations of the two output cases by virtue of different correlations that exist in these two cases. O'Reilly & Hoeffner (2000) provides a more systematic exploration of the contributions of these different mechanisms in this priming task.

Weight Priming Test

Having trained the network with the appropriate semantic background knowledge, we are now ready to assess its performance on the priming task. Priming is tested first for the 'a' otuputs, and then for the 'b' -- we describe the 'a' tests for concreteness here, but the 'b' test is identical except the 'b' outputs are used. The three phases of the test are:

  1. Pre-test: The network is first tested without any learning to determine its baseline likelihood of producing the 'a' response.
  2. Priming Training: The network is trained to produce the 'a' response, by presenting each of the 'a' output patterns as target training values for one single epoch of training using the same learning rate etc as in the original training (i.e., the standard .01 learning rate).
  3. Post-test: The network is then tested again without learning to determine its likelihood of producing 'a'.

The difference between the amount of 'a' responding in the post-test relative to the pre-test baseline is our behavioral measure of priming. An increased production of 'a' responses means that a single exposure to the 'a' outputs during Priming Training was sufficient to alter the networks overall behavior.

  • There is a special program that performs the above phases of testing, first on the 'a' patterns and then on the 'b' ones. In the control panel, run this by doing WtPrimeTest: Init, and then Step, while looking at the .T3Tab.WtPrimeNetwork display.

You should see the first input pattern ('0_a') being presented to the network, with no activations clamped in the Output layer (i.e., this is the minus or testing phase, where the network comes up with its own response). There is no plus phase in this initial test. The grid view above the network displays the closest event statistics for this pattern. The closest_name column should show either 0_a or 0_b. If it shows 0_a, then name_err should be 0, but if it shows 0_b, then name_err is 1. Generally, both_err should be 0 (i.e., it produces either 0_a or 0_b, but not 12_a or something else like that), and the min_dist may either be 0 or there may be small error in the exact output pattern on some trials.

  • Keep doing WtPrimeTest: Step for a few more patterns, until you've seen both a name_err 0 and 1 case. You can then do WtPrimeTest: Run to just run the rest of the procedure as described above. Then click on the .T3Tab.TestOutputData tab to view the overall results.

Lets look first at the top grid view, EpochTestOutputData, which shows the pre-test (baseline) and post-test (after priming) summary results for the 'a' priming test (first two rows) and the 'b' priming test (second two rows). The main column of interest is sum_name_err, which measures the extent to which the network produced the target outputs (lower numbers = more target outputs). As noted above, the priming measure is the difference between these. For example, if the first row ('a' pre-test) is 9 and the second ('a' post-test) is 4, then the network produced 5 more 'a' outputs after the priming experience than before. In general, the post-test value should always be lower than the pre-test.

The bottom grid view shows all the individual testing trials, so you can see exactly what the network produced. The priming training trials are not shown.


Question 9.1 (a) Report the overall priming results you got for this run, and do a couple of additional WtPrimeTest: Run's and report these results as well. (b) Explain why this behavior occurs, and relate it to the priming results for humans described earlier.


  • When you are done with this simulation, you can either close this project in preparation for loading the next project, or you can quit completely from the simulator.
Personal tools