CECN1 Family Trees
From Computational Cognitive Neuroscience Wiki
Contents |
Family Trees: Learning in Deep Networks
- The project file: family_trees.proj (click and Save As to download, then open in Emergent -- NOTE: requires version 4.13)
- Additional files for pre-trained weights and epoch data (optional):
Back to CECN1 Projects
Project Documentation
(note: this is a literal copy from the simulation documentation -- it contains links that will not work within the wiki)
- GENERAL USAGE NOTE: To start, it is usually a good idea to do
Object/Edit Dialogin the menu just above this text, which will open this documentation in a separate window that you can more easily come back to. Alternatively, you can just always return to this document by clicking on theProjectDocstab at the top of the middle panel.
Now, let's explore the case of learning in a deep network using the same family trees task as O'Reilly (1996b) and Hinton (1986). The structure of the environment is shown in Figure 6.7 in the text. The network is trained to produce the correct name in response to questions like "Rob is married to whom?" These questions are presented by activating one of 24 name units in an agent input layer (e.g., "Rob"), in conjunction with one of 12 units in a relation input layer (e.g., "Married'"), and training the network to produce the correct unit activation over the patient output layer.
First, notice that the network (displayed in family_trees tab in the far right panel) has Agent and Relation input layers, and a Patient output layer all at the bottom of the network. These layers have localist representations of the 24 different people and 12 different relationships, which means that there is no overlap, and thus no overt similarity, in these input patterns between any of the people. Thus, the Agent_Code, Relation_Code, and Patient_Code hidden layers provide a means for the network to re-represent these localist representations as richer distributed patterns that should facilitate the learning of the mapping by emphasizing relevant distinctions and deemphasizing irrelevant ones. The central Hidden layer is responsible for performing the mapping between these recoded representations to produce the correct answers.
- Press the .T3Tab.ft_1ufg_train tab in the right 3d view panel to display the first ten (out of 100) training events in the viewer window, which should help you understand how the task is presented to the network. (If you wish, you can click on the red arrow to activate selection mode on your cursor and "grab" the purple task bar on the right and scroll down to see the other events.) The names of the events in the first column are in the following format: Agent.Relation.Patient, and can be interpreted along the following lines: "Christo's wife is who?" "Penny."
Now, let's see how this works with the network itself.
- Press the .T3Tab.family_trees tab to display the network again.
- Press the
Init(and "Yes" to the dialog box query: "Initialize Network Weights?") andStepbuttons after activating the master .PanelTab.ControlPanel tab in the middle panel. This will run the network through oneLeabraSettleprocess.
The activations in the network display reflect the minus phase state for the first training event (selected at random from the list of all training events). NOTE: If you ever find that the network does not display activations, click on the family_trees tab in the middle panel and make sure the Display checkbox is checked.
- Press
Stepagain in the .PanelTab.ControlPanel to see the plus phase activations for the same event. Note that thePatientlayer is now displaying the correct answer. and that all the hidden layers change their activation patterns to reflect this additional information.
The default network is using a combination of Hebbian and GeneRec error-driven learning, with the amount of Hebbian learning set to .01 as reflected by the lmix.hebb parameter in the ControlPanel. (The error-driven component is automatically calculated to 1 - hebb). Let's see how long it takes this network to learn the task.
- Let's view a graph to monitor training. Press the .T3Tab.EpochOutputData tab in the right 3d view panel to display the graph. You will notice that this will also change the active tab in the middle panel so you will have to click the .PanelTab.ControlPanel tab to bring it back up. Press the
Initbutton at the bottom (and answer "Yes" to the dialog box query: "Initialize Network Weights?"). Then, pressRunto allow the network to train on all the events.
As the network trains, the graph displays the error count statistic for training (cnt_err). You can also display the average number of network settling cycles (avg_cycles) by activating the EpochOutputData tab in the middle Panels frame and toggling On the Y2: avg_cycles flag. It is probably best to leave this off before you continue.
Your network should train in around 40-50 epochs using the initial default parameters. This may take a few minutes. You can either watch and wait, or you can instead load the results from a fully trained network.
- If you want to wait for your network to train to completion, after training is done expand the
networks->family_treesbranch (near bottom) in the left tree browser. Right click onfamily_treesto display the context menu and scroll down toObject->SaveAsand save the trained network with some personalized descriptive name (e.g., mytrainednetwork.net) --- we will be able to come back and use it later. This saves the network with its trained weights.
- If you instead prefer to load the results from a fully trained network, press
Stopin the master .PanelTab.ControlPanel. To display a trained network's learning curve in theEpochOutputDatatab in the right panel, expand thedata->OutputData subgroupbranch in the left tree panel and click onEpochOutputDatato display the table of data in the middle panel. Then selectObject->Load_Data, revealing a dialog window. Click the reset_first button on, to remove any existing data in the table, but don't enter a file name -- just clickOk, and a file browser will appear showing several file names of pre-trained data. Highlight thefamily_trees.hebb_and_err.epc.datfile and click theOpenbutton to load the data into theEpochOutputDatadata table. Click on theEpochOutputDatatab in the right view frame if it is not already active; the learning curve should automatically display the trained data set.
This particular network took 41 epochs to learn. You might note that the EpochOutputData data table goes from 0 to 40.
The 41 epochs it took for the default network to learn the problem is actually relatively rapid learning for a deep network like this one. For example, Figure 6.9 shows a comparison of a typical learning curve in Leabra versus the fastest standard feedforward backpropagation network (BP; for a refresher see Box 5.2: The Backpropagation Algorithm in Chapter 5), which took about 77 epochs to learn and required a very large learning rate of .39 compared to the standard .01 for the Leabra network (O'Reilly, 1996b).
The Roles of Hebbian Vs. Error-Driven Learning
We are not so interested in raw learning speed for its own sake, but more in the facilitation of learning in deep networks from the additional biases or constraints imposed by combining model and task learning. Figure 6.9 shows that these additional constraints facilitate learning in deep networks; the purely task-driven BP network learns relatively slowly, whereas the Leabra network, with both Hebbian learning and inhibitory competition, learns relatively quickly. In this exploration (as in the previous one), we will manipulate the contribution of Hebbian learning. To do this, we can run a network without Hebbian learning and compare the learning times.
- If you want to wait while the network trains, go back to the master .PanelTab.ControlPanel and set the LearnRule
valuetoPURE_ERR, andApply. Then clickSetLearnRuleand you will see thelmix.hebbchange to 0. Now clickInitthenRunto train the network using thePURE_ERRvalues. This should take around 70-80 epochs or so.
Again, if you do not want to wait for the network to train, you can just load the results from a fully trained network.
- In this case, press
Stopin the .PanelTab.ControlPanel (if you started the network), then doObject->Load_Dataon theEpochOutputDatadata table again, and choosefamily_trees.pure_err.epc.dat(ensure that reset first is checked on).
Hebbian learning clearly facilitates learning in deep networks, as demonstrated by the network taking longer to learn without it (70 epochs in this case compared to 41; repeated runs of the networks with different starting weights substantiate this effect). Further, kWTA activation constraints play an important facilitatory role in learning as well. The benefits of kWTA activation constraints are somewhat obscured in comparing the purely error-driven Leabra network with the backpropagation (BP) network shown in Figure 6.9, because of the very high learning rate used for finding the best performance of the BP network. The benefits of kWTA activation constraints are particularly clear in comparing the purely error-driven Leabra network to a bidirectionally-connected error-driven (GeneRec) network that does not have the kWTA activation constraints, which takes around 300 or more epochs to learn at its fastest (O'Reilly, 1996b).
Now, let's see what pure Hebbian learning can do in this task.
- Select
PURE_HEBBfor the LearnRulevaluein the .PanelTab.ControlPanel as before (remembering toApplyandSetLearnRule). Now re-run the network (Init, thenRun).
You can Stop the network after 10 epochs or so because this network isn't going to improve at all. You can see this by loading the EpochOutputData data file for a network trained for 100 epochs:
- As before, do
Object->Load_Dataon theEpochOutputDatadata table, make sure reset first is clicked, and thenOkto display the list of files. Selectfamily_trees_pure_hebb.epc.dat.
As before, this will load the data in the EpochOutputData data table, and display it graphically in the .T3Tab.EpochOutputData view in the right view panel.
Although Hebbian model learning is useful for helping error-driven learning, the graph shows that it is simply not capable of learning tasks like this on its own.
We next compare all three cases with each other.
- Load the
family_trees_all.epc.datdata table file into EpochOutputData per the instructions above.
The graph view of this has the three runs displayed on top of each other, similar to Figure 6.10 in the text. You can identify the curves based on what epoch they end on: 100 = PURE_HEBB (all yellow, at the top; no learning); 70 = PURE_ERR (middle curve), and; 41 = HEBB_AND_ERR (bottom left corner).
Question 6.3 (a) What do you notice about the general shape of the standard backpropagation (BP) learning curve (SSE over epochs) in Figure 6.9 compared to that of the PURE_ERR Leabra network you just ran? Pay special attention to the first 30 or so epochs of learning. (b) Given that one of the primary differences between these two cases is that the PURE_ERR network has inhibitory competition via the kWTA function, whereas BP does not, speculate about the possible importance of this competition for learning based on these results (also note that the BP network has a much larger learning rate, .39 vs .01). (c) Now, compare the PURE_ERR case with the original HEBB_AND_ERR case (i.e., where do the cnt_err learning curves start to diverge, and how is this different from the BP case)? (d) What does this suggest about the role of Hebbian learning? (Hint: Error signals get smaller as the network has learned more.)
Cluster Plot Analysis
To get a sense of how learning has shaped the transformations performed by this network to emphasize relevant similarities, we can do a cluster plot of the hidden unit activity patterns over all the inputs. Let's do a comparison between the initial clusters and those after learning for the default network.
- First, press
Initon the master .PanelTab.ControlPanel to reinitialize the weights. Then, press Cluster Init and Cluster Run to generate a cluster plot..
After a bit (it tests all 100 patterns, so be patient), a cluster plot window will appear. We will compare this cluster plot to one for the trained network.
- Go to the left browser panel and expand the
networkstree, then click onfamily_treesand selectObject->Load Weightsfrom the menu (can also do right click and use the context menu), and then selectfamily_trees_hebb_and_err.wts(or the network that you saved). Then press Cluster Run.
Your results should look something like Figure 6.11 in the text. There are many ways in which people who appear together can be justifiably related, so you may think there is some sensibility to the initial plot. However, the final plot has a much more sensible structure in terms of the overall nationality difference coming out as the two largest clusters, and individuals within a given generation tending to be grouped together within these overall clusters. The network is able to solve the task by transforming the patterns in this way.
- To leave this project, click
File->Close Project. To continue on to the next simulation, select a new projectFile->Open Project...in the.viewers[0](root) - rootwindow. Or, if you wish to stop now, quit by selectingFile->Quit.
