CECN1 Sentence Gestalt
From Computational Cognitive Neuroscience Wiki
Contents |
The Sentence Gestalt Model
- The project file: sg.proj (click and Save As to download, then open in Emergent -- NOTE: requires version 4.13 or higher for testing
- Additional file for pretrained weights (required):
Back to CECN1 Projects
Project Documentation
(note: this is a literal copy from the simulation documentation -- it contains links that will not work within the wiki)
- To start, it is usually a good idea to do
Object/Edit Dialogin the menu just above this text, which will open this documentation in a separate window that you can more easily come back to. Alternatively, you can always return by clicking on theProjectDocstab at the top of this middle panel.
This network learns to encode both syntax and semantics of sentences in an integrated "gestalt" hidden layer. The sentences have simple agent-verb-patient structure with optional prepositional or adverb modifier phrase at the end, and can be either in the active or passive form (80% active, 20% passive). There are ambiguous terms that need to be resolved via context, showing a key interaction between syntax and semantics.
- First press LoadWeights on the overall .PanelTab.ControlPanel to load pre-trained weights (network takes several hours to train). Then, you can start by poking around the network and exploring the connectivity using the r.wt view, and then return to viewing act.
Note that the input/output units are all labeled according to the word or role they represent -- these are used primarily for decoding the network output and are not particularly easy to read in the network without zooming in.
Training
The network is trained by asking questions about all current and previous information presented to the network. The minus phase of every trial is the current word input, plus a question posed in the Role input layer about what a particular semantic role is for the current sentence. For example, for the first word in an active sentence, which is always the "agent" of the sentence, the Role input will be agent, and the network just needs to produce on the Filler output layer the same thing that is present in the Input. However, on the next trial, a new word (the verb) is presented, but the network must still remember the agent from the prior trial. This is what the Gestalt_Context layer provides. You can see all of this in the testing we'll perform next. If you want to see the training process in more detail, you can click open the programs/LeabraAll_Std subgroup/LeabraTrain program and do Init and Step through a few trials (do not re-init the weights, or if you do, be sure to do LoadWeights again prior to testing).
Testing
Now, let's evaluate the trained network's performance, by exploring its performance on a set of specially selected test sentences shown in Table 10.15 in the text and listed here:
- Role assignment:
- Active semantic: The schoolgirl stirred the Kool-Aid with a spoon.
- Active syntactic: The busdriver gave the rose to the teacher.
- Passive semantic: The jelly was spread by the busdriver with the knife.
- Passive syntactic: The teacher was kissed by the busdriver.
- Active control: The busdriver kissed the teacher.
- Word ambiguity:
- The busdriver threw the ball in the park.
- The teacher threw the ball in the living room.
- Concept instantiation:
- The teacher kissed someone (male).
- Role elaboration
- The schoolgirl ate crackers (with finger).
- The schoolgirl ate (soup).
- Online update
- The child ate soup with daintiness.
- control: The pitcher ate soup with daintiness.
- Conflict
- The adult drank iced tea in the kitchen (living room).
The test sentences are designed to illustrate different aspects of the sentence comprehension task, as noted in the table. First, a set of role assignment tasks provide either semantic or purely syntactic cues to assign the main roles in the sentence. The semantic cues depend on the fact that only animate nouns can be agents, whereas inanimate nouns can only be patients. Although animacy is not explicitly provided in the input, the training environment enforces this constraint. Thus, when a sentence starts off with "The jelly...," we can tell that because jelly is inanimate, it must be the patient of a passive sentence, and not the agent of an active one. However, if the sentence begins with "The busdriver...," we do not know if the busdriver is an agent or a patient, and we thus have to wait to see if the syntactic cue of the word was appears next.
- Do Test: Init and Step in the overall control panel.
The first word of the active semantic role assignment sentence (schoolgirl) is presented, and the network correctly answers that schoolgirl is the agent of the sentence. Note that there is no plus-phase and no training during this testing, so everything depends on the integration of the input words.
- Continue to Step through to the final word in this Active semantic sentence (spoon). You can click on the .T3Tab.TrialTestOutputData tab to see a display of each item in the sentence and the network's response in the output_name column. You should observe that the network is able to identify correctly the roles of all of the words presented. Because in this sentence the roles of the words are constrained by their semantics, this success demonstrates that the network is sensitive to these semantic constraints and can use them in parsing.
- Now Step through the next sentence (Active syntactic).
This sentence has two animate nouns (busdriver and teacher), so the network must use the syntactic word order cues to infer that the busdriver is the agent, while using the "gave to" syntactic construction to recognize that the teacher is the recipient. Observe that at the final word in the sentence, the network has correctly identified all the words.
In the next sentence, the passive construction is used, but this should be obvious from the semantic cue that jelly cannot be an agent.
- Step through Passive semantic and observe that the network correctly parses this sentence.
In the final role assignment case, the sentence is passive and there are only syntactic constraints available to identify whether the teacher is the agent or the patient. This is the most difficult construction that the network faces, and it does not appear to get it right -- it gets confused about the busdriver toward the end of the sentence, replacing him with pitcherpers and schooolgirl.
- Step through this Passive syntactic sentence.
Further testing has shown that the network sometimes gets this sentence right, but often makes errors. This can apparently be attributed to the lower frequency of passive sentences, as you can see from the next sentence, which is a "control condition" of the higher frequency active form of the previous sentence, with which the network has no difficulties.
- Step through this Active control sentence.
The next two sentences test the network's ability to resolve ambiguous words, in this case throw and balll based on the surrounding semantic context. During training, the network learns that busdrivers throw baseballs, whereas teachers throw parties. Thus, the network should produce the appropriate interpretation of these ambiguous sentences.
- Step through these next two sentences (Ambiguity1 and 2) to verify that this is the case.
Note that the network makes a mistake here by replacing teacher with the other agent that also throws parties, the schoolgirl. Thus, the network's context memory is not perfect, but it tends to make semantically appropriate errors, just as people do.
The next test sentence probes the ability of the network to instantiate an ambiguous term (e.g., someone) with a more concrete concept. Because the teacher only kisses males (the pitcher or the busdriver), the network should be able to instantiate the ambiguous someone with either of these two males.
- As you Step through this sentence, observe that someone is instantiated with pitcherpers. Note that the fact that the trial_name says pitcherpers does not mean that this is ever input to the network -- because this is purely a test, no filler information is ever presented to the network.
A similar phenomenon can be found in the role elaboration test questions. Here, the network is able to answer questions about aspects of an event that were not actually stated in the input. For example, the network can infer that the schoolgirl would eat crackers with her fingers.
- Step through the next sentence (Role elaboration1).
You should see that the very last question regarding the instrument role is answered correctly with fingers, even though fingers was never presented in the input. The next sentence takes this one step further and has the network infer what the schoolgirl tends to eat (crackers).
- Go ahead and Step through this one (Role elaboration2).
Question 10.13 In chapter 9, we discussed a mechanism for using partial cues to retrieve an original stored memory. (a) Explain the network's role elaboration performance in terms of this mechanism. (b) Based on what you know about the rates of learning of different brain areas, speculate about differences in where in the brain role elaboration might take place based on how familiar the information in question is.
The next test sentence is intended to evaluate the online updating of information in a case where subsequent information further constrains an initially vague word. In this case, the sentence starts with the word child, and the original weights for the PDP++ version of the network vacillated back and forth about which child it answered the agent question with. When the network received the adverb daintiness, this uniquely identified the schoolgirl, which it then reported as the agent of the sentence (even though it did not appear to fully encode the daintiness input, producing pleasure instead). In the trained weights for this model, the network strongly encoded pitcher as the child, and even decided that he should be eating steak (perhaps as a carry-over from the fact that the other male, the busdriver, has a strong preference from steak). This reinforcement of the male agent representation made the model impervious to the daintiness input.
- Step through the Online Update sentence.
To verify that daintiness is having an effect on this result, we can run the next control condition where the pitcher is specified as the agent of the sentence -- the PDP++ network clearly switched from saying pitcherpers to saying schoolgirl after receiving the daintiness input. Again, this model reinforces the male representation and outputs steak and is unaffected by the daintiness input.
- Step through the Online control sentence.
The final test sentence illustrates how the network deals with conflicting information. In this case, the training environment always specifies that iced tea is drunk in the living room, but the input sentence says it was drunk in the kitchen.
- Step through this Conflict sentence.
Notice that the network responds with pitcher (container) for the location, despite getting kitchen in the input. This also causes it to think that the agent is the teacher for some reason.. The main point is that when kitchen is input, the network responds with something more consistent with its prior knowledge (Iced-tea is stirred in the container). This may provide a useful demonstration of how prior knowledge biases sentence comprehension, as has been shown in the classic "war of the ghosts" experiment (Bartlett, 1932) and many others.
Nature of Representations
Having seen that the network behaves reasonably (if not perfectly), we can explore the nature of its internal representations to get a sense of how it works.
- Press ProbeClust on the overall control panel.
After a short delay, the cluster plot for the unambiguous nouns will show up (Figure 10.30), followed by the cluster plot for a set of probe sentences (shown in Table 10.16 in the text (see data/InputData/ProbeSentences data table if you want). These sentences systematically vary the agents, patients, and verbs to reveal how these are represented.
- Click the .T3Tab.NounEncodeCluster first.
This cluster plot, which resembles Figure 10.30 in the text (but is slightly different due to different weights) shows very sensible similarity relationships among the encoding representation of the inputs.
- Click the .T3Tab.ProbeSentCluster tab to see the sentence probe cluster plot.
This cluster plot (similar to Figure 10.31 in the text) clearly shows that the sentences are first clustered together according to verb, and then by patient, and then by agent within that. Furthermore, across the different patients, there appears to be the same similarity structure for the agents. Thus, we can see that the gestalt representation encodes information in a systematic fashion, as we would expect from the network's behavior.
