CECN1 Past Tense
From Computational Cognitive Neuroscience Wiki
Overregularization of the English Past Tense
- The project file: pt.proj (click and Save As to download, then open in Emergent -- NOTE: requires version 4.13 or higher
- Additional file for pretrained weights (required):
- Optional files:
- pt.epc.dat -- epoch log from training
- pt.trl.dat.gz -- trial log from training -- for analyzing results (summary of analysis already included in saved project)
Back to CECN1 Projects
Project Documentation
(note: this is a literal copy from the simulation documentation -- it contains links that will not work within the wiki)
- To start, it is usually a good idea to do
Object/Edit Dialogin the menu just above this text, which will open this documentation in a separate window that you can more easily come back to. Alternatively, you can always return by clicking on theProjectDocstab at the top of this middle panel.
This project is large and takes a long time to train, so we can just load pretrained weights to start with.
- Do LoadWeightsin the .PanelTab.ControlPanel.
This network has already been through the U-shaped overregularization period and is generally producing correct outputs for all the words.
To start, we will just observe the network as it produces phonological outputs from semantic patterns.
- Do Step and answer No to the prompt about initializing the weights (if you accidentally answer Yes, just hit LoadWeights again).
You can see the semantic input being presented, and the network settling produces an activation pattern over the phonological output, which is interpreted for you by the display above the Phonology output layer (the ph_out column, which should match the input indicated in the trial name column).
Looking at the network, you should notice 4 active units next to each other in the back of the of the semantic input. These 4 units indicate which inflection is to be produced. When you step to the next word, you will see different inflections being produced.
- Step through several different words to get a sense of what the inputs and outputs look like and how the network performs. It should only occasionally make a mistake, and very very rarely produce an overregularization of an irregular past tense word (we'll analyze these more systematically later).
Now that we can see that the network has learned the task, we can analyze some of the connectivity to determine how it is working.
Because we are most interested in the past-tense mapping, we will focus on that first. To find out which hidden units are most selective for the past-tense inflectional semantics, we will look at the sending weights from the past-tense inflectional semantics units to the hidden layer. We'll use the weight projection visualization technique.
- Turn on wt lines in the .T3Tab.PastTenseNet network view control, make sure s.wt is clicked next to it and that Semantics is selected as the wt prjn layer, and then select the wt_prjn variable to view in the network. Click on the red arrow, and then select the unit in the 2nd to last row, 5 units over from the left hand side. This is the semantic input for the past tense inflection. You should see several units in the Hidden layer have orange level weights (somewhat strong) from this unit. Select the unit in the very front row, a bit to the right of the center of the hidden layer.
You should now see that this unit has very specific patterns of connectivity with the last column of units in the Phonology output layer, which is where the regular past-tense inflection is produced. If you click on the .T3Tab.PhonemePatterns tab, you'll see that this pattern is exactly the "D" (-ed) inflection as shown in the right-most table of patterns. Thus, this unit is encoding the regularity of a past-tense inflectional semantics mapping to the regular past-tense inflection "-ed".
Therefore, it is very clear that this unit plays an important role in producing the regular past tense inflection. Presumably, other units that compete with these units get more activated by the irregular words, and thus are able to suppress the regular inflection. It is the sensitive dynamics of this kind of competition as the network settles that contributes to a relatively strong U-shaped overregularization curve, as we will see.
- Analyze the production of the progressive "-ing" inflection, using using the same technique we just used for the past tense inflection (the "-ing" inflectional semantics start at the third unit from the left in the last row, and the inflectional phonological pattern is shown by the G in the right-most table in .T3Tab.PhonemePatterns).
Question 10.9 (a) Do the most strongly connected units from the -ing semantics code for the appropriate inflectional phonological pattern? (b) Describe the steps you took to reach this answer.
Overrregularization
Although it is interesting to see something about how the network has learned this task, the most relevant empirical data is in the time-course of its overregularizations during training. The program called PastTenseAnalysis (under programs in the left browser) performs an error-scoring of the phonological output from the network for each trial as it learned, looking for overregularizations of irregular past tense words, in addition to a number of other patterns of output that are not discussed here.
- Click on the .T3Tab.OverregGraph tab to see the graph of the results, which looks very similar to Figure 10.20 in the text. This figure in the text shows the plot of overregularizations for both a Leabra network and a standard feedforward backpropagation network (Bp) run on the same task. Note that, following convention, overregularization is plotted as 1 minus the proportion of overregularization errors, which gives the characteristic U-shape that is evident in the graph. The plots also show the proportion of phonologically valid responses that the network made for past-tense verbs, which is important for evaluating when overregularizations occur relative to any period of early correct responding.
There are two critical U-shaped curve features that are evident in the comparison between the Leabra and Bp networks. First, the Leabra network achieves a substantial level (around 50%) of responding prior to the onset of overregularization. Thus, the network demonstrates an early-correct period of irregular verb production, where irregular verbs are being produced without overregularization errors. This is the critical aspect of the empirical data that previous models have failed to capture without questionable manipulations of the training corpus or other parameters. These other models look more like the Bp network in the figure, with overregularization beginning quite early relative to the level of responding. Furthermore, increasing the learning rate in Bp uniformly advances both responding and overregularization, preserving the same basic relationship.
The second critical feature is the overall level of overregularization and its tendency to continue on at a low, sporadic rate over an extended period of time, which is an important characteristic of the human data. The Leabra network shows this characteristic, but the Bp network exhibits a rapidly resolving overregularization period, consistent with the purely gradient-descent nature of Bp. The extended overregularizations in Leabra can be attributed to a dynamic competition between the regular and irregular mappings that is played out on each settling trial, and is affected by small weight changes that effectively prime the regular mapping (O'Reilly & Hoeffner, 2000). Thus, during a protracted period of learning, the Leabra network is dynamically balanced on the "edge" between the regular and irregular mappings, and can easily shift between them, producing the characteristic low rate of sporadic overregularizations.
To provide a more representative quantitative assessment of these two critical features, O'Reilly & Hoeffner (2000) ran 25 random networks through the early correct period and recorded the number of valid responses to past-tense irregulars prior to the first and second overregularization (Figure 10.21 in the textbook). Again, a standard backpropagation (Bp) network was compared with Leabra, with three different levels of Hebbian learning: none (H0), .001 (as explored above), and .005. The results confirm that Leabra exhibits a significantly more substantial early correct period compared to Bp, and further shows that Hebbian learning has little effect either way.
O'Reilly & Hoeffner (2000) also ran 5 of each type of network for a full 300 epoch period and counted the total number of overregularizations produced (Figure 10.22). This also confirms our previous single-network results -- the interactivity and inhibitory competition in Leabra facilitate a dynamic priming-like overregularization phenomenon that is absent in the feedforward backpropagation network (Bp). Again, Hebbian learning does not increase the effect, and here we find that with a larger Hebbian level (.005), overregularization is actually decreased.
The null or detrimental effects of Hebbian learning in this large-scale model are somewhat inconsistent with smaller-scale models of the dynamic competition and priming account of overregularization as explored by O'Reilly & Hoeffner, 2000. Nevertheless, it is clear that the major effect here is in the activation dynamics that facilitate the competition between the irregular and regular mappings, and although Hebbian learning does not facilitate the effects, it does facilitate our ability to interpret the network's weights as we did above. The impairment in overregularizations with larger amounts of Hebbian learning (.005) can be attributed to its tendency to differentiate (cluster separately, see chapter~\ref{chap.learn_model}) the irregular and regular mappings such that overregularization becomes less likely.
- To look at the detailed listing of overregularizations produced by this network over training, click on the .T3Tab.OverregListing and scroll through. One might be tempted to consider these overregularizations "cute" in the sense that they sound just like what a child would do!
