CECN1 Localist vs Distributed
From Computational Cognitive Neuroscience Wiki
Localist vs. Distributed Representations
- The project file: loc_dist.proj (click and Save As to download, then open in Emergent)
Back to CECN1 Projects
Project Documentation
(note: this is a literal copy from the simulation documentation -- it contains links that will not work within the wiki)
In this project, we will explore the difference between localist and distributed representations.
- To start, it is usually a good idea to do
Object/Edit Dialogin the menu just above this text, which will open this documentation in a separate window that you can more easily come back to. Alternatively, you can always return by clicking on theProjectDocstab at the top of this middle panel.
This project looks very similar to the previous one (transform) -- it is in fact a superset of it, having both the previous localist network, and a new distributed one. To begin, we will first replicate the results we obtained before using the localist network.
- Do in the .PanelTab.ControlPanel, do Init and Run
You should see the responses of the same localist network just as before.
Now let's examine the distributed network, which is also visible in the .T3Tab.Digit_Network view on the right. This network contains only 5 hidden units. Let's explore this network by examining the weights into these hidden units.
- At the bottom of the .PanelTab.Digit_Network netview control panel, there are tabs for each of the display elements present in this one view (Localist Network, TrialOutputData Grid, Distributed Network) -- select the Distributed Network tab to get the control panel for that item, and then select
r.wtfor the variable value to view, and then click on each of the units in the hidden layer. Click back onactwhen done.
You will notice that these units are configured to detect parts or features of digit images, not entire digits as in the localist network. Thus, you can imagine that these units will be active whenever one of these features is present in the input.
- Set the
networktoDistributed_Networkinstead of Localist_Network, and hit Apply. Do Run or single Step throug the items for this network.
Now verify for yourself in the Grid View that the firing patterns of the hidden units make sense given the features present in the different digits. The only case that is somewhat strange is the third hidden unit firing for the digit ``0 --- it fires because the left and right sides match the weight pattern. There is an important lesson here --just because you might have visually encoded the third hidden unit as the ``middle horizontal line detector, it can actually serve multiple roles. This is just a simple case of the kind of complexity that surrounds the attempt to describe the content of what is being detected by neurons. Imagine if there were 5,000 weights, with a much more complicated pattern of values, and you can start to get a feel for how complicated a neuron's responses can be.
Now, let's see what a cluster plot of the hidden unit representations tells us about the properties of the transformation performed by this distributed network.
- Select Noisy_Digits for the input_data (Apply), and Run. Then, set cluster data src to TrialOutputData and do Cluster Init and Cluster Run.
This should produce a cluster plot like that shown in Figure 3.13. Although there are a couple of obvious differences between this plot and the one for the localist network shown in Figure 3.8b, it should be clear that the distributed network is also generally emphasizing the distinctions between different digits while deemphasizing (collapsing) distinctions among noisy versions of the same digit.
One difference in the distributed network is that it sometimes collapsed noisy versions of different digits together (a 2 with the 5's, and a 0 with the 4's), even though in most cases the different versions of the same digit were properly collapsed together. It also did not always collapse all of the different images of a digit together, sometimes only getting 2 out of 3. The reason for these problems is that the noisy versions actually shared more features with a different digit representation. In most cases where we use distributed representations, we use a learning mechanism to discover the individual feature detectors, and learning will usually do a better job than our simple hand-set weight values at emphasizing and deemphasizing the appropriate distinctions as a function of the task we train it on.
Another aspect of the distributed cluster plot is that the digit categories are not equally separate elements of a single cluster group, as with the localist representation. Thus, there is some residual similarity structure between different digits reflected in the hidden units, though less than in the input images, as one can tell because of the "flatter" cluster structure (i.e., the clusters are less deeply nested within each other, suggesting more overall equality in similarity differences). Of course, this residual similarity might be a good thing in some situations, as long as a clear distinction between different digits is made. Again, we typically rely on learning to ensure that the representations capture the appropriate transformations.
Another test we can perform is to test this network on the letter input stimuli.
- Set input data to Letters and Run. Then do a cluster plot on the resulting hidden units.
Although this network clearly does a better job of distinguishing between the different letters than the localist network, it still collapses many letters into the same hidden representation. Thus, we have evidence that these distributed feature detectors are appropriate for representing distinctions among digits, but not letters.
Question 3.5 The distributed network achieves a useful representation of the digits using half the number of hidden units as the localist network (and this number of hidden units is 1/7th the number of input units, greatly compressing the input representation) -- explain how this efficiency is achieved.
