CECN1 Pattern Associator
Pattern Associator (Task Learning)
- The project file: pat_assoc.proj (click and Save As to download, then open in Emergent)
- pat_assoc screenshots -- for incorporation in PowerPoint slides, etc.
Back to CECN1 Projects
(note: this is a literal copy from the simulation documentation -- it contains links that will not work within the wiki)
- GENERAL USAGE NOTE: To start, it is usually a good idea to do
Object/Edit Dialogin the menu just above this text, which will open this documentation in a separate window that you can more easily come back to. Alternatively, you can just always return to this document by clicking on the
ProjectDocstab at the top of the middle panel.
Exploration of Hebbian Task Learning (Section 5.2 in Text)
This exploration is based on a very simple form of task learning, where a set of 4 input units project to 2 output units. The "task" is specified in terms of the relationships between patterns of activation over the input units, and the corresponding desired or target values of the output units. This type of network is often called a pattern associator because the objective is to associate patterns of activity on the input with those on the output.
You should see the network in the
PatAssocNet tab in the far right 3d view frame. Note that there are 2 output units receiving inputs from 4 input units through a set of feedforward weights (see also Figure 5.1 in the text).
- Click the .T3Tab.EasyEnv tab at the top of 3D view panel (far right) to view the events in the environment.
As you can see, the input-output relationships to be learned in this "task" are simply that the leftmost two input units should make the left output unit active, while the rightmost units should make the right output unit active. This can be thought of as categorizing the first two inputs as "left" with the left output unit, and the next two as "right" with the right output unit. NOTE: Figure 5.2 in the text is configured differently than .T3Tab.EasyEnv, but they are the same patterns. This is true for figures 5.3 and 5.7 as well.
This is a relatively easy task to learn because the left output unit just has to develop strong weights to the leftmost input units and ignore the ones to the right, while the right output unit does the opposite. Note that we are using kWTA inhibition within the output layer, with a k parameter of 1.
The network is trained on this task by simply clamping both the input and output units to their corresponding values from the events in the environment, and performing CPCA Hebbian learning on the resulting activations.
- First, press the .T3Tab.PatAssocNet tab to reactivate it, then press the master .PanelTab.ControlPanel tab in the middle panel. Now press the
Initbutton there (bottom), then
Step4 times while you watch the network. NOTE: For this exploration, you should always answer "Yes" to "Initialize Network Weights?" after
You should see all 4 events from the environment presented in a random order.
- Now press
You will see the activations in the output units are different this time. This is because it was the testing phase, which is run after every epoch of training. During this testing phase, all 4 events are presented to the network, except this time the output units are not clamped to the correct answer, but are instead updated solely according to their current weights from the input units (which are clamped as before). Thus, the testing phase records the current actual performance of the network on this task, when it is not being "coached" (that is why it's a test).
- Now click on the .T3Tab.TrialOutputGrid tab in the far right 3D view panel (followed by the
Refreshbutton in the top right of the middle .PanelTab.TrialOutputGrid panel if the data is not displayed).
The results of the test run you just ran are displayed. Each row represents one of the four events, with the input pattern and the actual output activations shown on the right. The
sse column reports the summed squared error (SSE), which is simply the summed difference between the actual output activation during testing (o_k) and the target value (t_k) that was clamped during training:
- SSE = \sum_k (t_k - o_k)^2
where the sum is over the 2 output units. We are actually computing the thresholded SSE, where absolute differences of less than 0.5 are treated as zero, so the unit just has to get the activation on the correct side of 0.5 to get zero error. We thus treat the units as representing underlying binary quantities (i.e., whether the pattern that the unit detects is present or not), with the graded activation value expressing something like the likelihood of the underlying binary hypothesis being true. All of our tasks specify binary input/output patterns.
With only a single training epoch, the output unit is likely making some errors.
- Click on the .T3Tab.TrialOutputGrid tab in the far right panel if its not already active and then the master .PanelTab.ControlPanel tab in the middle panel. Press the
Runbuttons while you watch the grid in the right frame.
You will see the grid view update after each trial, showing the pattern of outputs and the individual SSE (
- Next, click on the .T3Tab.EpochOutputGraph tab in the far right panel (and the
Refreshbutton in the middle frame if necessary).
Now you will see a summary plot across epochs of the sum of the thresholded SSE measure across all the events in the epoch. This shows what is often referred to as the learning curve for the network, and it should have rapidly gone to zero, indicating that the network has learned the task. Training will stop automatically after the network has exhibited 5 correct epochs in a row (just to make sure it has really learned the problem), or it stops after 30 epochs if it fails to learn.
Let's see what the network has learned.
- Click the .T3Tab.PatAssocNet tab in the (far right) panel to display the network. Press the
TestStepbutton in the master .PanelTab.ControlPanel 4 times.
This will step through each of the training patterns and update the .T3Tab.TrialOutputGrid. Click on it and hit
Refresh as before to display the results. You should see that the network has learned this easy task, turning on the left output for the first two patterns, and the right one for the next two. Now, let's take a look at the weights for the output unit to see exactly how this happened.
- Click on the .T3Tab.PatAssocNet tab in the right frame and then on the
r.wtvalue of the middle panel that also gets displayed. (You may have to scroll down --- it is near the bottom of the list of values.) Now click on the red arrow ("Select") tool in the right panel and select the left output unit in the network.
You should see that, as expected, the weights from the left 2 units are strong (near 1), and those from the right 2 units are weak (near 0). The complementary pattern should hold for the right output unit.
Question 5.1 Explain why this pattern of strong and weak weights resulted from the CPCA Hebbian learning algorithm.
The Hard Task
Now, let's try a more difficult task.
env_typeon the master .PanelTab.ControlPanel tab to
Apply. Click the .T3Tab.HardEnv tab at the top of the far right panel to view the events in the
In this harder environment (note: figure 5.3 in text is in a different format), there is overlap among the input patterns for cases where the left output should be on, and where it should be off (and the right output on). This overlap makes the task hard because the unit has to somehow figure out what the most distinguishing or task relevant input units are, and set its weights accordingly.
This task reveals a problem with Hebbian learning. It is only concerned with the correlation (conditional probability) between the output and input units, so it cannot learn to be sensitive to which inputs are more task relevant than others (unless this happens to be the same as the input-output correlations, as in the easy task). This hard task has a complicated pattern of overlap among the different input patterns. For the two cases where the left output should be on, the middle two input units are very strongly correlated with the output activity (conditional probability P(x_i|y_j) = 1), while the outside two inputs are only half-correlated (P(x_i|y_j) = .5). The two cases where the left output should be off (and the right one on) overlap considerably with those where it should be on, with the last event containing both of the highly correlated inputs. Thus, if the network just pays attention to correlations, it will tend to respond incorrectly to this last case.
Let's see what happens when we run the network on this task.
- After making sure you are viewing the
r.wtreceiving weights of the left output unit in the .PanelTab.PatAssocNet tab in the middle panel, press the
Init("Yes" to "Initialize Network Weights?") and then
Runbuttons in the master .PanelTab.ControlPanel, which runs the network after a new set of random starting weights.
You should see from these weights that the network has learned that the middle two units are highly correlated with the left output unit, as we expected.
- Return to viewing the
actvariable (.PanelTab.PatAssocNet) and then do
TestStep4 times (master .PanelTab.ControlPanel).
You should see that the network is not getting the right answers. (Hint: You can also look at the .T3Tab.TrialOutputGrid to see all events at once.) Different runs will produce slightly different results, but the first two events should generally turn the left output unit on (correctly), the third event should tend to (correctly) turn on the right unit, but the fourth unit should tend to (incorrectly) turn on the left unit because of the strength of the weights from the middle two input units to the left output unit.
The weights to the right output unit (
r.wt in the .T3Tab.PatAssocNet tab of the middle panel) show that it has strongly represented its correlation with the second input unit, which explains the pattern of output responses. This weight to the right output unit can have a net stronger effect than those to the left output unit from the two middle inputs because of the different overall activity levels in the different input patterns --- this difference in alpha affects the renormalization correction for the CPCA Hebbian learning rule as described earlier in the text (note that even if this renormalization is set to a constant across the different events, the network still fails to learn). For the fourth event, however, the "double dose" of the strong weights from the two middle units favors the left unit leading to a consistent error.
- Do several more
Runs on this
HARDtask. You can try increasing the
max_epochsparameter to 50, or even 100, in the master .PanelTab.ControlPanel if you wish.
Question 5.2 (a) Does the network ever solve the task? (b) Report the final
sse at the end of training for each run.
- Experiment with the parameters that control the contrast enhancement of the CPCA Hebbian learning rule (
wt_sig.off), to see if these are playing an important role in the network's behavior.
You should see that changes to these parameters do not lead to any substantial improvements. Hebbian learning does not seem to be able to solve tasks where the correlations do not provide the appropriate weight values. It seems unlikely that there will generally be a coincidence between correlational structure and the task solution. Thus, we must conclude that Hebbian learning is of limited use for task learning. In contrast, we will see in the next section that an algorithm specifically designed for task learning can learn this task without much difficulty.
- To continue on to the next simulation, you can leave this project open because we will use it again. Or, if you wish to stop now, and come back to it later, quit by selecting
File->CloseProjectin the main project window and then
Exploration of Delta Rule Task Learning (Section 5.5 in Text)
- Reset the parameters to their default values using the
Defaultsbutton in the master .PanelTab.ControlPanel ("Yes" to "Initialize Network Weights").
learn_rule valuein the master .PanelTab.ControlPanel and click
Applyand then, while watching the Learning Parameters fields, click
This will switch weight updating from the default CPCA Hebbian rule explored previously to the delta rule. The effects of this switch can be seen in the
Learning Parameters group, which shows the learning rate for the weights (
lrate, always .01) and for the bias weights (
bias_lrate, which is 0 for Hebbian learning because it has no way of training the bias weights, and is equal to
lrate for delta rule), and the proportion of Hebbian learning (
hebb, 1 or 0 --- we will see in the next chapter that intermediate values of this parameter can be used as well). IMPORTANT: Note that it is
SetLearnRule that actually changes the
Learning Parameters values.
Before training the network, we will explore how the minus-plus activation phases work in the simulator.
- Make sure that you are monitoring activations in the network by selecting
actin the .T3Tab.PatAssocNet middle panel if it is not already highlighted. Also make sure the
Display!checkbox in checked. Next, set
LeabraTrialin the master .PanelTab.ControlPanel.
This will increase the resolution of the stepping so that each press of the
Step button will perform only the settling (iterative activation updating) process associated with one phase of processing at a time.
- Next hit the
You will see in the network the actual activation produced in response to the input pattern (also known as the expectation or response, or minus phase activation).
- Now, hit
You will see the target (also known as the outcome, or instruction, or plus phase) activation. Learning occurs after this second, plus phase of activation. You can recognize targets, like all external inputs, because their activations are exactly .95 or 0 -- note that we are clamping activations to .95 (not 1.0) because units cannot easily produce activations above .95 with typical net input values due to the saturating nonlinearity of the rate code activation function. You can also switch to viewing the
targ value (2 above
act in the .PanelTab.PatAssocNet tab in the middle panel), which will show you the target inputs prior to the activation clamping. In addition, the minus phase activation is always viewable as
act_m and the plus phase as
Now, let's monitor the weights.
- Click on
r.wtto monitor receiving weights. (Remember that you may have to scroll down the list of values since it's near the end.) Then click on the Red Arrow tool in the top right corner of the .T3Tab.PatAssocNet and select the left output unit. Click
Runin the master .PanelTab.ControlPanel to complete the training on this .T3Tab.EasyEnv task.
The network has no trouble learning this task -- you can click on the .T3Tab.EpochOutputGraph tab (with
Refresh if necessary) to confirm. However, if you perform multiple
Run 's, you should be able to notice that the final weight values are quite variable relative to the Hebbian case (you can always switch the
LearnRule back to
HEBB in the master control panel to compare between the two learning algorithms). In particular, you might note that there is a much less clear cut differentiation between the first two units vs. the last two in the
DELTA rule case.
This variability in the weights reflects a critical weakness of error-driven learning -- it's lazy. Basically, once the output unit is performing the task correctly, learning effectively stops, with whatever weight values that happened to do the trick. In contrast, Hebbian learning keeps adapting the weights to reflect the conditional probabilities, which, in this task, results in roughly the same final weight values regardless of what the initial random weights were. We will return to this issue later in Chapteer 6, when we discuss the benefits of using a combination of Hebbian and error-driven learning.
Now for the real test.
HARDin the master .PanelTab.ControlPanel and also change the
max_epochparameter to 50. Then, press
Run. Click on the .T3Tab.EpochOutputGraph tab in the far right frame to watch the learning curve.
You should see that the network learns this task without much difficulty (although it sometimes needs > 30 epochs). Thus, because the delta rule performs learning as a function of how well the network is actually doing, it can adapt the weights specifically to solve the task.
Question 5.3 (a) Compare and contrast in a qualitative manner the nature of the weights learned by the delta rule on this
HARD task with those learned by the Hebbian rule (e.g., note where the largest weights tend to be) -- be sure to do multiple runs to get a general sense of what tends to be learned. (b) Using your answer to the first part, explain why the delta rule weights solve the problem, but the Hebbian ones do not (don't forget to include the bias weights
bias.wt in your analysis of the delta rule case).
After this experience, you may think that the delta rule is all powerful, but we can temper this enthusiasm and motivate the next section.
IMPOSSIBLE. Then, click on the .T3Tab.ImpossibleEnv tab in the far right panel. (Note that figure 5.7 in the text has a different layout.)
Notice that each input unit in this environment is active equally often when the output is active as when it is inactive. That is, there is complete overlap among the patterns that activate the different output units. These kinds of problems are called ambiguous cue problems, or nonlinear discrimination problems (Sutherland & Rudy, 1989; O'Reilly & Rudy, 2000). This kind of problem might prove difficult, because every input unit will end up being equivocal about what the output should do. Nevertheless, the input patterns are not all the same -- people could learn to solve this task fairly trivially by just paying attention to the overall patterns of activation. Let's see if the network can do this.
Runon the master .PanelTab.ControlPanel. Activate the .T3Tab.EpochOutputGraph tab again to watch the learning curve.
Do it again. And again. Any luck? If you wish, you can increase the max_epochs to 100, or even 150, if you wish -- and see if it learns.
Because the delta rule cannot learn what appears to be a relatively simple task, we conclude that something more powerful is necessary. Unfortunately, that is not the conclusion that Minsky & Papert (1969) reached in their highly influential book, Perceptrons. Instead, they concluded that neural networks were hopelessly inadequate because they could not solve problems like the one we just explored (specifically, they focused on the exclusive-or (XOR) task)! This conclusion played a large role in the waning of the early interest in neural network models of the 1960s. Interestingly, we will see that only a few more applications of the chain rule are necessary to remedy the problem, but this fact took a while to be appreciated by most people (roughly fifteen years, in fact).
- To continue on to the next simulation, close this project first by selecting
File->Close Project. It's probably better to not save upon closing so you can sure the exercises will work when reopened. Or, if you wish to stop now, quit by then selecting