CECN1 Pattern Associator

From Computational Cognitive Neuroscience Wiki
Jump to: navigation, search

Pattern Associator (Task Learning)

Back to CECN1 Projects

Project Documentation

(note: this is a literal copy from the simulation documentation -- it contains links that will not work within the wiki)

  • GENERAL USAGE NOTE: To start, it is usually a good idea to do Object/Edit Dialog in the menu just above this text, which will open this documentation in a separate window that you can more easily come back to. Alternatively, you can just always return to this document by clicking on the ProjectDocs tab at the top of the middle panel.

Exploration of Hebbian Task Learning (Section 5.2 in Text)

This exploration is based on a very simple form of task learning, where a set of 4 input units project to 2 output units. The "task" is specified in terms of the relationships between patterns of activation over the input units, and the corresponding desired or target values of the output units. This type of network is often called a pattern associator because the objective is to associate patterns of activity on the input with those on the output.

You should see the network in the PatAssocNet tab in the far right 3d view frame. Note that there are 2 output units receiving inputs from 4 input units through a set of feedforward weights (see also Figure 5.1 in the text).

  • Click the .T3Tab.EasyEnv tab at the top of 3D view panel (far right) to view the events in the environment.

As you can see, the input-output relationships to be learned in this "task" are simply that the leftmost two input units should make the left output unit active, while the rightmost units should make the right output unit active. This can be thought of as categorizing the first two inputs as "left" with the left output unit, and the next two as "right" with the right output unit. NOTE: Figure 5.2 in the text is configured differently than .T3Tab.EasyEnv, but they are the same patterns. This is true for figures 5.3 and 5.7 as well.

This is a relatively easy task to learn because the left output unit just has to develop strong weights to the leftmost input units and ignore the ones to the right, while the right output unit does the opposite. Note that we are using kWTA inhibition within the output layer, with a k parameter of 1.

The network is trained on this task by simply clamping both the input and output units to their corresponding values from the events in the environment, and performing CPCA Hebbian learning on the resulting activations.

  • First, press the .T3Tab.PatAssocNet tab to reactivate it, then press the master .PanelTab.ControlPanel tab in the middle panel. Now press the Init button there (bottom), then Step 4 times while you watch the network. NOTE: For this exploration, you should always answer "Yes" to "Initialize Network Weights?" after Init.

You should see all 4 events from the environment presented in a random order.

  • Now press TestStep 4 times.

You will see the activations in the output units are different this time. This is because it was the testing phase, which is run after every epoch of training. During this testing phase, all 4 events are presented to the network, except this time the output units are not clamped to the correct answer, but are instead updated solely according to their current weights from the input units (which are clamped as before). Thus, the testing phase records the current actual performance of the network on this task, when it is not being "coached" (that is why it's a test).

The results of the test run you just ran are displayed. Each row represents one of the four events, with the input pattern and the actual output activations shown on the right. The sse column reports the summed squared error (SSE), which is simply the summed difference between the actual output activation during testing (o_k) and the target value (t_k) that was clamped during training:

  • SSE = \sum_k (t_k - o_k)^2

where the sum is over the 2 output units. We are actually computing the thresholded SSE, where absolute differences of less than 0.5 are treated as zero, so the unit just has to get the activation on the correct side of 0.5 to get zero error. We thus treat the units as representing underlying binary quantities (i.e., whether the pattern that the unit detects is present or not), with the graded activation value expressing something like the likelihood of the underlying binary hypothesis being true. All of our tasks specify binary input/output patterns.

With only a single training epoch, the output unit is likely making some errors.

  • Click on the .T3Tab.TrialOutputGrid tab in the far right panel if its not already active and then the master .PanelTab.ControlPanel tab in the middle panel. Press the Init and Run buttons while you watch the grid in the right frame.

You will see the grid view update after each trial, showing the pattern of outputs and the individual SSE (sse) errors.

  • Next, click on the .T3Tab.EpochOutputGraph tab in the far right panel (and the Refresh button in the middle frame if necessary).

Now you will see a summary plot across epochs of the sum of the thresholded SSE measure across all the events in the epoch. This shows what is often referred to as the learning curve for the network, and it should have rapidly gone to zero, indicating that the network has learned the task. Training will stop automatically after the network has exhibited 5 correct epochs in a row (just to make sure it has really learned the problem), or it stops after 30 epochs if it fails to learn.

Let's see what the network has learned.

This will step through each of the training patterns and update the .T3Tab.TrialOutputGrid. Click on it and hit Refresh as before to display the results. You should see that the network has learned this easy task, turning on the left output for the first two patterns, and the right one for the next two. Now, let's take a look at the weights for the output unit to see exactly how this happened.

  • Click on the .T3Tab.PatAssocNet tab in the right frame and then on the r.wt value of the middle panel that also gets displayed. (You may have to scroll down --- it is near the bottom of the list of values.) Now click on the red arrow ("Select") tool in the right panel and select the left output unit in the network.

You should see that, as expected, the weights from the left 2 units are strong (near 1), and those from the right 2 units are weak (near 0). The complementary pattern should hold for the right output unit.

Question 5.1 Explain why this pattern of strong and weak weights resulted from the CPCA Hebbian learning algorithm.

The Hard Task

Now, let's try a more difficult task.

  • Set env_type on the master .PanelTab.ControlPanel tab to HARD, and Apply. Click the .T3Tab.HardEnv tab at the top of the far right panel to view the events in the HARD environment.

In this harder environment (note: figure 5.3 in text is in a different format), there is overlap among the input patterns for cases where the left output should be on, and where it should be off (and the right output on). This overlap makes the task hard because the unit has to somehow figure out what the most distinguishing or task relevant input units are, and set its weights accordingly.

This task reveals a problem with Hebbian learning. It is only concerned with the correlation (conditional probability) between the output and input units, so it cannot learn to be sensitive to which inputs are more task relevant than others (unless this happens to be the same as the input-output correlations, as in the easy task). This hard task has a complicated pattern of overlap among the different input patterns. For the two cases where the left output should be on, the middle two input units are very strongly correlated with the output activity (conditional probability P(x_i|y_j) = 1), while the outside two inputs are only half-correlated (P(x_i|y_j) = .5). The two cases where the left output should be off (and the right one on) overlap considerably with those where it should be on, with the last event containing both of the highly correlated inputs. Thus, if the network just pays attention to correlations, it will tend to respond incorrectly to this last case.

Let's see what happens when we run the network on this task.

  • After making sure you are viewing the r.wt receiving weights of the left output unit in the .PanelTab.PatAssocNet tab in the middle panel, press the Init ("Yes" to "Initialize Network Weights?") and then Run buttons in the master .PanelTab.ControlPanel, which runs the network after a new set of random starting weights.

You should see from these weights that the network has learned that the middle two units are highly correlated with the left output unit, as we expected.

You should see that the network is not getting the right answers. (Hint: You can also look at the .T3Tab.TrialOutputGrid to see all events at once.) Different runs will produce slightly different results, but the first two events should generally turn the left output unit on (correctly), the third event should tend to (correctly) turn on the right unit, but the fourth unit should tend to (incorrectly) turn on the left unit because of the strength of the weights from the middle two input units to the left output unit.

The weights to the right output unit (r.wt in the .T3Tab.PatAssocNet tab of the middle panel) show that it has strongly represented its correlation with the second input unit, which explains the pattern of output responses. This weight to the right output unit can have a net stronger effect than those to the left output unit from the two middle inputs because of the different overall activity levels in the different input patterns --- this difference in alpha affects the renormalization correction for the CPCA Hebbian learning rule as described earlier in the text (note that even if this renormalization is set to a constant across the different events, the network still fails to learn). For the fourth event, however, the "double dose" of the strong weights from the two middle units favors the left unit leading to a consistent error.

  • Do several more Runs on this HARD task. You can try increasing the max_epochs parameter to 50, or even 100, in the master .PanelTab.ControlPanel if you wish.

Question 5.2 (a) Does the network ever solve the task? (b) Report the final sse at the end of training for each run.

  • Experiment with the parameters that control the contrast enhancement of the CPCA Hebbian learning rule (wt_sig.gain and wt_sig.off), to see if these are playing an important role in the network's behavior.

You should see that changes to these parameters do not lead to any substantial improvements. Hebbian learning does not seem to be able to solve tasks where the correlations do not provide the appropriate weight values. It seems unlikely that there will generally be a coincidence between correlational structure and the task solution. Thus, we must conclude that Hebbian learning is of limited use for task learning. In contrast, we will see in the next section that an algorithm specifically designed for task learning can learn this task without much difficulty.

  • To continue on to the next simulation, you can leave this project open because we will use it again. Or, if you wish to stop now, and come back to it later, quit by selecting File->CloseProject in the main project window and then File->Quit in the .viewers[0](root) window.

Exploration of Delta Rule Task Learning (Section 5.5 in Text)

  • Reset the parameters to their default values using the Defaults button in the master .PanelTab.ControlPanel ("Yes" to "Initialize Network Weights").
  • Select DELTA instead of HEBB for the learn_rule value in the master .PanelTab.ControlPanel and click Apply and then, while watching the Learning Parameters fields, click SetLearnRule.

This will switch weight updating from the default CPCA Hebbian rule explored previously to the delta rule. The effects of this switch can be seen in the Learning Parameters group, which shows the learning rate for the weights (lrate, always .01) and for the bias weights (bias_lrate, which is 0 for Hebbian learning because it has no way of training the bias weights, and is equal to lrate for delta rule), and the proportion of Hebbian learning (hebb, 1 or 0 --- we will see in the next chapter that intermediate values of this parameter can be used as well). IMPORTANT: Note that it is SetLearnRule that actually changes the Learning Parameters values.

Before training the network, we will explore how the minus-plus activation phases work in the simulator.

  • Make sure that you are monitoring activations in the network by selecting act in the .T3Tab.PatAssocNet middle panel if it is not already highlighted. Also make sure the Display! checkbox in checked. Next, set step_prog to LeabraSettle instead of LeabraTrial in the master .PanelTab.ControlPanel.

This will increase the resolution of the stepping so that each press of the Step button will perform only the settling (iterative activation updating) process associated with one phase of processing at a time.

  • Next hit the Step button.

You will see in the network the actual activation produced in response to the input pattern (also known as the expectation or response, or minus phase activation).

  • Now, hit Step again.

You will see the target (also known as the outcome, or instruction, or plus phase) activation. Learning occurs after this second, plus phase of activation. You can recognize targets, like all external inputs, because their activations are exactly .95 or 0 -- note that we are clamping activations to .95 (not 1.0) because units cannot easily produce activations above .95 with typical net input values due to the saturating nonlinearity of the rate code activation function. You can also switch to viewing the targ value (2 above act in the .PanelTab.PatAssocNet tab in the middle panel), which will show you the target inputs prior to the activation clamping. In addition, the minus phase activation is always viewable as act_m and the plus phase as act_p.

Now, let's monitor the weights.

  • Click on r.wt to monitor receiving weights. (Remember that you may have to scroll down the list of values since it's near the end.) Then click on the Red Arrow tool in the top right corner of the .T3Tab.PatAssocNet and select the left output unit. Click Init and Run in the master .PanelTab.ControlPanel to complete the training on this .T3Tab.EasyEnv task.

The network has no trouble learning this task -- you can click on the .T3Tab.EpochOutputGraph tab (with Refresh if necessary) to confirm. However, if you perform multiple Run 's, you should be able to notice that the final weight values are quite variable relative to the Hebbian case (you can always switch the LearnRule back to HEBB in the master control panel to compare between the two learning algorithms). In particular, you might note that there is a much less clear cut differentiation between the first two units vs. the last two in the DELTA rule case.

This variability in the weights reflects a critical weakness of error-driven learning -- it's lazy. Basically, once the output unit is performing the task correctly, learning effectively stops, with whatever weight values that happened to do the trick. In contrast, Hebbian learning keeps adapting the weights to reflect the conditional probabilities, which, in this task, results in roughly the same final weight values regardless of what the initial random weights were. We will return to this issue later in Chapteer 6, when we discuss the benefits of using a combination of Hebbian and error-driven learning.

Now for the real test.

You should see that the network learns this task without much difficulty (although it sometimes needs > 30 epochs). Thus, because the delta rule performs learning as a function of how well the network is actually doing, it can adapt the weights specifically to solve the task.

Question 5.3 (a) Compare and contrast in a qualitative manner the nature of the weights learned by the delta rule on this HARD task with those learned by the Hebbian rule (e.g., note where the largest weights tend to be) -- be sure to do multiple runs to get a general sense of what tends to be learned. (b) Using your answer to the first part, explain why the delta rule weights solve the problem, but the Hebbian ones do not (don't forget to include the bias weights bias.wt in your analysis of the delta rule case).

After this experience, you may think that the delta rule is all powerful, but we can temper this enthusiasm and motivate the next section.

  • Set env_type to IMPOSSIBLE. Then, click on the .T3Tab.ImpossibleEnv tab in the far right panel. (Note that figure 5.7 in the text has a different layout.)

Notice that each input unit in this environment is active equally often when the output is active as when it is inactive. That is, there is complete overlap among the patterns that activate the different output units. These kinds of problems are called ambiguous cue problems, or nonlinear discrimination problems (Sutherland & Rudy, 1989; O'Reilly & Rudy, 2000). This kind of problem might prove difficult, because every input unit will end up being equivocal about what the output should do. Nevertheless, the input patterns are not all the same -- people could learn to solve this task fairly trivially by just paying attention to the overall patterns of activation. Let's see if the network can do this.

Do it again. And again. Any luck? If you wish, you can increase the max_epochs to 100, or even 150, if you wish -- and see if it learns.

Because the delta rule cannot learn what appears to be a relatively simple task, we conclude that something more powerful is necessary. Unfortunately, that is not the conclusion that Minsky & Papert (1969) reached in their highly influential book, Perceptrons. Instead, they concluded that neural networks were hopelessly inadequate because they could not solve problems like the one we just explored (specifically, they focused on the exclusive-or (XOR) task)! This conclusion played a large role in the waning of the early interest in neural network models of the 1960s. Interestingly, we will see that only a few more applications of the chain rule are necessary to remedy the problem, but this fact took a while to be appreciated by most people (roughly fifteen years, in fact).

  • To continue on to the next simulation, close this project first by selecting File->Close Project. It's probably better to not save upon closing so you can sure the exercises will work when reopened. Or, if you wish to stop now, quit by then selecting File->Quit in the .viewers[0](root) window.