CECN1 Model And Task
From Computational Cognitive Neuroscience Wiki
Combined Model and Task Learning
- The project file: model_and_task.proj (click and Save As to download, then open in Emergent -- NOTE: requires version 4.13 or higher)
Back to CECN1 Projects
Project Documentation
(note: this is a literal copy from the simulation documentation -- it contains links that will not work within the wiki)
- GENERAL USAGE NOTE: To start, it is usually a good idea to do
Object/Edit Dialogin the menu just above this text, which will open this documentation in a separate window that you can more easily come back to. Alternatively, you can just always return to this document by clicking on theProjectDocstab at the top of the middle panel.
This exploration starts with Section 6.3.1 in the text - Exploration of Generalization. We will explore some of the ideas regarding the importance of Hebbian learning and inhibitory competition for generalization performance. We will start by exploring one simple example that uses the oriented lines environment explored previously in the model learning chapter.
Notice that the network now has an output layer -- each of the ten output units corresponds with a horizontal or vertical line in one of the five different positions (figure 6.3 in the text).
The task to be learned by this network is quite simple -- activate the appropriate output units for the combination of lines present in the input layer. This task provides a particularly clear demonstration of the generalization benefits of adding Hebbian learning to otherwise purely error-driven learning. However, because the task is so simple it does not provide a very good demonstration of the weaknesses of pure Hebbian learning, which is actually capable of learning this task most of the time. The next section includes demonstrations of the limitations of Hebbian learning.
As in earlier projects, the master .PanelTab.ControlPanel contains three key parameters in the Learn Parameters section. The first two subfields control the learning rate of the network weights (lrate) and the bias weights (bias_spec lrate). The latter is 0 for pure Hebbian learning because it has no way of training the bias weights, and is equal to regular (network weights) lrate for error-driven learning. The third subfield is the parameter hebb, which determines the relative weighting of Hebbian learning compared to error-driven learning (equation 6.1 in text). This is the main parameter we will investigate to compare purely Hebbian (model) learning (hebb=1), purely error-driven (task) learning (hebb=0), and their combination (hebb between 0 and 1). Because we have to turn off the learning in the bias weights when doing pure Hebbian learning, we will use the learn_rule field to select the learning rule, which sets all of the parameters appropriately. Let's begin with pure error-driven (task) learning.
- Set
learn_ruletoPURE_ERR, andApply. Now, while watching the Learn Parameters click on theSetLearnRulebutton at the bottom. You will see the learning parameters change to reflect the change in learning rule.
Let's see how the network is trained.
- Press
Init->Stepin the .PanelTab.ControlPanel.
This is the minus phase of processing for the first event, showing two lines presented in the input, and undoubtedly the wrong output units activated. Now let's see the plus phase.
- Press
Stepagain.
The output units should now reflect the 2 lines present in the input. The lower output row represents the vertical lines; the upper row the horizontal. Accordingly, position 0 (bottom/left) represents the leftmost vertical line and so on.
- You can continue to
Stepthrough more trials.
The network is only being trained on 35 out of the 45 total patterns, with the remaining 10 reserved for testing generalization. Because each of the individual lines is presented during training, the network should be able to recognize them in the novel combinations of the testing set. In other words, the network should be able to generalize to the testing items by processing the novel patterns in terms of novel combinations of existing hidden representations, which have appropriate associations to the output units.
- After you get tired of
Stepping, clickRuninstead. To speed up processing turn off theDisplaytoggle in the top left corner of the .PanelTab.Network tab in the middle frame. Now watch theEpochOutputData Graphin the top right area of the right frame to monitor the learning curve.
As the network trains, the graph is updated every epoch with the training error statistic (black line), and every 5 epochs with two important test statistics (figure 6.4 in text).
Instead of using raw SSE for the training error statistic, we will often use a count of the number of events for which there is any error at all (again using the .5 threshold on each unit), so that each unit has to have its activation on the right side of .5 for the event not to be counted in this measure. This is plotted in the black line in the graph, and the simulator labels it as cnt_err.
One of the test statistics, plotted in blue, measures the generalization performance of the network (gen_cnt). The blue line plots this generalization performance in terms of the number of testing events that the network gets wrong (out of the 10 testing items), so the smaller this value, the better the generalization performance. This network appears to be quite bad at generalizing, with 9 of the 10 novel testing patterns having errors.
The other test statistic, plotted in red (unq_pats), is the same unique pattern statistic as used before in the self_org.proj (section 4.8.1), which measures the extent to which the hidden units represent the lines distinctly (from 0 meaning no lines distinctly represented to 10 meaning all lines distinctly represented). This unique pattern statistic shows that the hidden units do not uniquely represent all of the lines distinctly, though this statistic does not seem nearly as bad as either the generalization error or the weights that we consider next.
- Turn your attention now to the
EpochOutputData Gridin the top left of the .T3Tab.Network tab of the far right frame. This displays the weight values as a nested grid of the Input layer weights to each of the Hidden units.
You should see that the weights look relatively random (also figure 6.5 in the text) and clearly do not reflect the linear structure of the underlying environment. To see how much these weights change over learning from the truly random initial weights, we can run again while watching the EpochOutputData Grid, which is updated every 5 epochs as before.
- Press
Runagain.
The generalization error measure, the hidden unit weights, and the unique pattern statistic all provide converging evidence for a coherent story about why generalization is poor in a purely error-driven network. As we said, generalization here depends on being able to recombine representations that systematically encode the individual line elements independent of their specific training contexts. In contrast, error-driven weights are generally relatively underconstrained by learning tasks, and thus reflect a large contribution from the initial random values, rather than the kind of systematicity needed for good generalization. This lack of constraint prevents the units from systematically carving up the input/output mapping into separable subsets that can be independently combined for the novel testing items --- instead, each unit participates haphazardly in many different aspects of the mapping. The attractor dynamics in the network then impair generalization performance. Thus, the poor generalization arises due to the effects of the partially-random weights on the attractor dynamics of the network, preventing it from combining novel line patterns in a systematic fashion.
To determine how representative this particular result is, we can run a batch of 5 training runs.
- Press the
Batch: Init, Runbuttons on the .PanelTab.ControlPanel. Let the model run through the full batch of all 5 training runs, which may take several minutes.
The results will be recorded in two tables displayed in the .T3Tab.BatchOutputData tab in the far right frame. The top grid view shows the final results after each training run, while the bottom one shows summary statistics from the 5 training runs once they are done.
Question 6.1 Report the summary statistics from the BatchOutputData table for your batch run. Does this indicate that your earlier observations were generally applicable?
Given the explanation above about the network's poor generalization, it should be clear why both Hebbian learning and kWTA inhibitory competition can improve generalization performance. At the most general level, they constitute additional biases that place important constraints on learning and the development of representations. More specifically, Hebbian learning constrains the weights to represent the correlational structure of the inputs to a given unit, producing systematic weight patterns (e.g., cleanly separated clusters of strong correlations; chapter 4).
Inhibitory competition helps in two ways. First, it encourages individual units to specialize on representing a subset of items, thus parceling up the task in a much cleaner and more systematic way than would occur in an otherwise unconstrained network. Second, as discussed in chapter 3 (section 3.6.3), inhibitory competition restricts the settling dynamics of the network, greatly constraining the number of states that the network can settle into, and thus eliminating a large proportion of the attractors that can hijack generalization.
We cannot easily test the effects of kWTA inhibition in our simulation framework, because removing inhibitory competition necessitates other problematic compensatory manipulations such as the use of positive/negative valued weights; however, clear advantages for inhibitory competition in generalization have been demonstrated elsewhere (O'Reilly, 2001).
We can explore the role of Hebbian learning, however. Let's see if we can improve the network's generalization performance by adding some Hebbian learning.
- Set
learn_ruleon the master .PanelTab.ControlPanel toHEBB_AND_ERR,Apply, and then clickSetLearnRule.
You should note a change to a lmix.hebb value of .05 now.
- Do
Train: Init, Runin the master .PanelTab.ControlPanel and then keep your eye on theEpochOutputData GraphandEpochOutputData Gridin the .T3Tab.Network tab (right frame). Note especially how, in the latter, the weights now appear to much more explicitly represent the individual lines. After this, do aBatch:Init ->Runto collect more data.
Question 6.2 (a) How did this .05 of additional Hebbian learning change the results compared to purely error-driven learning? (b) Report the results from the .T3Tab.BatchOutputData table (far right frame) for the batch of 5 training runs. (c) Explain these results in terms of the weight patterns, the unique pattern statistic, and the general effects of Hebbian learning in representing the correlational structure of the input.
You should have seen a substantial improvement from adding the Hebbian learning. Now, let's see how well pure Hebbian learning does on this task.
- Set
learn_ruletoPURE_HEBB,Apply, andSetLearnRuleagain.
This changes lmix.hebb to 1, and sets bias_spec lrate to 0.
- Do a
Run.
You will probably notice that the network learns quite rapidly. The network will frequently get perfect performance on the task itself, and on the generalization test. However, every 5 or so runs, the network fails to learn perfectly and, due to the inability of Hebbian learning to correct such errors, never gets better.
- To find such a case more rapidly, you can press
Stopwhen the network has already learned perfectly (so you don't have to wait until it finishes the prescribed number of epochs), and then pressRunagain.
Because this task has such obvious correlational structure that is well suited for the Hebbian learning algorithm, it is clear why Hebbian learning helps. However, even here, the network is more reliable if error-driven learning is also used. We will see in the next task that Hebbian learning helps even when the correlational structure is not particularly obvious, but that pure Hebbian learning is completely incapable of learning. These lessons apply across a range of different generalization tests, including ones that rely on more holistic similarity-based generalization, as compared to the compositional (feature-based) domain explored here (O'Reilly, 2001, O'Reilly, 1996b).
- To leave this project, click
File->Close Project. To continue on to the next simulation, select a new projectFile->Open Project...in the.viewers[0](root) - rootwindow. Or, if you wish to stop now, quit by selectingFile->Quit.
