CECN1 Hebbian Correlation
From Computational Cognitive Neuroscience Wiki
Contents |
Hebbian Correlational Model Learning
- The project file: hebb_correl.proj (click and Save As to download, then open in Emergent)
Back to CECN1 Projects
Project Documentation
(note: this is a literal copy from the simulation documentation -- it contains links that will not work within the wiki)
Our approach toward model learning is based on correlations in the environment. These correlations are important, because in general it seems that the world is inhabited by things with relatively stable features (e.g., a tree with branches, mammals with legs, an individual's face with eyes, nose, and mouth, and so on), and these features will be manifest as reliable correlations in the patterns of activity in our sensory inputs.
Figure 4.4 in the textbook shows a simple example of the correlations between the individual pixels (picture elements) that make up the image of a line. These pixels will all be active together when the line is present in the input, producing a positive correlation in their activities. This correlation will be reliable (present across many different input images) to the extent that there is something reliable {\em in the world} that tends to produce such lines (e.g., edges of objects). Further, the parsimony of our model can be enhanced if only the strongest (most reliable) features or components of the correlational structure are extracted. We will see in the next section how Hebbian learning will cause units to represent the strongest correlations in the environment.
Initial Observation of Correlational Learning
Before delving into a more detailed analysis of Hebbian learning in the next section, we will first explore a simplified example of the case shown in figure 4.4 in a simulation (this corresponds to section 4.3.1 in the textbook). In this exploration, we will see how a single unit (using a Hebbian learning mechanism that will be explained in detail below) learns to represent the correlations present between the pixels in a line.
You will see a network with a 5x5 input layer and a single receiving hidden unit (figure 4.5 in the textbook), in addition to the usual other windows. To make things as simple as possible, we will just present a single rightward leaning diagonal line and see what effect the Hebbian learning has on this hidden unit's weights. Thus, the environment will have these units 100 percent correlated with each other and the firing of the hidden unit, and this extreme strong correlation should be encoded by the effects of the Hebbian learning mechanism on the weights.
First, let's look at the initial weights of this hidden unit.
- Select
r.wtin the .PanelTab.HebbCorrelNet netview control panel and then click on the hidden unit using the red arrow button.
You should see a uniform set of .5 weight values, which provide an "blank page" starting point for observing the effects of subsequent learning.
- Then, click back on act, and then do Init and Run in the .PanelTab.ControlPanel (click on Yes to Initialize the Weights in the dialog that appears).
You will just see the activation of the right-leaning diagonal line.
- Then, click back on r.wt.
You will see that the unit's weights have learned to represent this line in the environment.
- Click on \verb\Run\ again.
You can now see the entire progression from the initial weights to the line representation while looking at the weights.
The simple point of this exploration is that Hebbian learning will tend to cause units to represent stable things in the environment. This is a particularly simple case where there was just a single thing in the environment, but it serves to convey the point.
Thorough Exploration of Hebbian Model Learning
This section (corresponding to section 4.6 in the textbook) provides a more detailed exploration of this Hebbian learning model. We see how a single unit learns in response to different patterns of correlation between its activity and a set of input patterns. This exploration will illustrate how conditionalizing the activity of the receiving unit can shape the resulting weights to emphasize a feature present in only a subset of input patterns. However, we will find that we need to introduce some additional factors in the learning rule to make this emphasis really effective. These factors will be even more important for the self-organizing case that is explored in a subsequent section.
As before, we will want to watch the weights of the hidden unit as it learns.
- Select r.wt as the variable to view in the network window, and click on the hidden unit. Now, click on the .T3Tab.Input_Patterns tab in the 3D view window to see the input patterns.
You should see that the OneLineEnv_Spec set of input patterns (on the left of the display) has 2 events, one having a right-leaning diagonal line, and the other having a left leaning one. These are the two sets of correlations that exist in this simple environment.
To keep things simple in this simulation, we will manipulate the percentage of time that the receiving unit is active in conjunction with each of these events to alter the conditional probabilities that drive learning in the CPCA algorithm. Thus, we are only simulating those events that happen when the receiver is active -- when the receiver is not active, no learning occurs, so we can just ignore all these other events for the present purposes. As a result, what we think of as conditional probabilities actually appear in the simulation as just plain unconditional probabilities -- we are ignoring everything outside the conditional (where the unit is inactive). In later simulations, we will explore the more realistic case of multiple receiving units that are activated by different events, and we will see how more plausible ways of conditional probability learning can arise through self-organizing learning.
The frequency column in the input data shows the probabilities or normalized frequencies associated with each event.
You should see that the Right event has frequency 1, and the Left event has frequency 0, indicating that the receiving unit will be active all of the time in conjunction with the right-leaning diagonal line, and none of the time with the left-leaning one (this was the default for the initial exploration from before).
Again, these absolute probabilities of presenting these lines actually correspond to conditional probabilities, because we are ignoring all the other possible cases where the receiving unit is inactive -- we are implicitly conditioning the entire simulation on the receiving unit being active (so that it is indeed always active for every input pattern).
The parameter p_right in the .PanelTab.ControlPanel determines the frequencies of the events in the environment, with the Right event being set to p_right and Left to 1-p_right.
- Set p_right to .7, and hit Apply and then the Init button -- you will see the frequency values updated to .7 and .3. Then, go ahead and switch back to viewing the .T3Tab.HebbCorrelNet view before continuing.
Keep in mind as we do these exercises that this single receiving unit will ordinarily just be one of multiple such receiving units looking at the same input patterns. Thus, we want this unit to specialize on representing one of the correlated features in the environment (i.e., 1 of the 2 lines in this case). We can manipulate this specialization by making the conditional probabilities weighted more toward one event over the other.
- Now, press the Run button in the control panel.
This will run the network through 25 sets (epochs) of 100 randomly ordered event presentations, with 70 of these presentations being the Right event, and 30 being the Left event (given a p_right value of .7). The CPCA Hebbian learning rule (equation 4.12 in the textbook) is applied after each event presentation and the weights updated accordingly. You will see the display of the weights in the network window being updated after each trial.
Another way to look at the development of the weights over learning is to use a graph log.
- Select the .T3Tab.EpochOutputData tab in the right 3D view panel, to reveal a graph log, and then do Run again.
The graph log (figure 4.11 in the textbook) displays the value of one of the weights from a unit in the right-leaning diagonal line (wt_right, in black), and from a unit in the left-leaning diagonal line (wt_left, in red). You should notice that as learning proceeds, the weights from the units active in the Right event will hover right around .7 (with the exception of the central unit, which is present in both events and will have a weight of around 1), while the weights for the Left event will hover around .3. Thus, as expected, the CPCA learning rule causes the weights to reflect the conditional probability that the input unit is active given that the receiver was active. Experiment with different values of p_right, and verify that this holds for all sorts of different probabilities.
Learning Rate
The parameter lrate in the .PanelTab.ControlPanel, which corresponds to epsilon in the CPCA learning rule (equation 4.12), determines how rapidly the weights are updated after each event.
- Change lrate to .1 and Run.
Question 4.1 (a) How does this change in the learning rate affect the general character of the weight updates as displayed in the network window? (b) Explain why this happens. (c) Explain the relevance (if any) this might have for the importance of integrating over multiple experiences (events) in learning.
- Set the lrate parameter back to .005.
Selectivity
When you explored different values of p_right previously, you were effectively manipulating how selective the receiving unit was for one type of event over another. Thus, you were taking advantage of the conditional aspect of CPCA Hebbian learning by effectively conditionalizing its representation of the input environment. As we stated earlier, instead of manipulating the frequency with which the two events occurred in the environment, you should think of this as manipulating the frequency with which the receiving unit was co-active with these events, because the receiving unit is always active for these inputs.
Now we want to compare the conditionalizing aspect of CPCA with the unconditional PCA algorithm. Let's assume that each event Right and Left) has an equal probability of appearing in the environment.
- Set p_right to .5, Apply, and Run.
This will simulate the effects of a standard (unconditional) form of PCA, where the receiving unit is effectively always on for the entire environment (unlike CPCA which can have the receiving unit active when only one of the lines is present in the environment).
Question 4.2 (a) What result does p_right=.5 lead to for the weights? (b) Does this weight pattern suggest the existence of two separate diagonal line features existing in the environment? Explain your answer. (c) How does this compare with the "blob" solution for the natural scene images as discussed above and shown in figure 4.8 in the textbook?
Question 4.3 (a) How would you set p_right to simulate the hidden unit controlled in such a way as to come on only when there is a right-leaning diagonal line in the input, and never for the left one? (b) What result does this lead to for the weights? (c) Explain why this result might be more informative than the case explored in the previous question. (d) How would you extend the architecture and training of the network to represent this environment of two diagonal lines in a fully satisfactory way? Explain your answer.
The simple environment we have been using so far is not very realistic, because it assumes a one-to-one mapping between input patterns and the categories of features that we would typically want to represent.
- Switch the
input_datafromOneLineEnvtoThreeLineEnvin the .PanelTab.ControlPanel (and Apply). Then, click on .T3Tab.Input_Patterns in the right 3D view panel to view these input patterns.
Notice that there are now three different versions of both the left and right diagonal lines, with upper and lower diagonals in addition to the original two center diagonal lines. In this environment, p_right is spread among all three types of right lines, which are conceived of as mere subtypes of the more general category of right lines (and likewise for 1-p_right and the left lines). This is reflected in the frequency column as you can see.
- Go ahead and Run this environment, while looking back at the network view r.wt weights.
You should see that the right lines all have weights of around .2333 and the left lines have weights around .1. Although this is the correct result for representing the conditional probabilities, this result illustrates a couple of problems with the CPCA learning algorithm. First, when units represent categories of features instead of single instances, the weights end up being diluted because the receiving unit is active for several different input patterns, so the conditional probabilities for each individual pattern can be relatively small. Second, this dilution can be compounded by a receiving unit that has somewhat less than perfect selectivity for one category of features (right) over others (left), resulting in relatively small differences in weight magnitude (e.g., .233 for right versus .1 for left). This is a real problem because units are generally not very selective during the crucial early phases of learning for reasons that will become clear later.
Thus, in some sense, the CPCA algorithm is too faithful to the actual conditional probabilities, and does not do enough to emphasize the selectivity of the receiving unit. Also, these small overall weight values reduce the dynamic range of the weights, and end up being inconsistent with the weight values produced by the task learning algorithm described in chapter 5. The next section in the textbook shows how we can deal with these limitations of the basic CPCA rule. After that, we will revisit this simulation.
Renormalization and Contrast Enhancement in CPCA
This section corresponds to section 4.7.3 in the textbook, where we explore the effects of renormalization and contrast enhancment on Hebbian learning.
We are first going to explore the renormalization of the weights by taking into account the expected activity level over the input layer, alpha. Because most of our line stimuli have 5 units active, and there are 25 units in the input layer, this alpha value is set to .2. Let's explore this issue using an environment where the features have zero correlation with the receiving unit, and see how the renormalization results in weight values of .5 for this case.
- Set input_data in the control panel to FiveHorizLines (Apply). Click on .T3Tab.Input_Patterns to see these patterns.
You should see that the environment contains 5 horizontal lines, each of which is presented with equal probability (i.e., 1/5 or .2). Thus, these line features represent the zero correlation case, because they each co-occur with the receiving unit with the same probability as the expected activity level over the input layer (.2). In other words, you would expect this same level of co-occurrence if you simply activated input units at random such that the overall activity level on the input was at .2.
- Click on r.wt in the .T3Tab.HebbCorrelNet panel (and select the hidden unit), and then Run the network.
You will see that because of the very literal behavior of the unmodified CPCA algorithm in reflecting the conditional probabilities, the weights are all around .2 at the end of learning. Thus, if we were interpreting these weights in terms of the standard meaning of conditional probabilities (i.e., where .5 represents zero correlation), we would conclude that the input units are anticorrelated with the receiving unit. However, we know that this is not correct given the sparse activity levels in the input.
- Now, set savg_cor.cor (which is the q_m parameter in equation 4.20 in the textbook) in the .PanelTab.ControlPanel to a value of 1 instead of 0.
This means that we will now be applying the full correction for the average activity level in the sending (input) layer.
- Run the network again.
You should observe that the weights now hover around .5, which is the correct value for expressing the lack of correlation.
Although the ability to fully correct for sparse sending activations is useful, one does not always want to do this. In particular, if we have any prior expectation about how many individual input patterns should be represented by a given hidden unit, then we can set savg_cor.cor appropriately so that the .5 level corresponds roughly to this prior expectation. For example, if we know that the units should have relatively selective representations (e.g., one or two input features per unit), then we might want to set savg_cor.cor to .5 or even less, because the full correction for the input layer alpha will result in larger weights for features that are relatively weakly correlated compared to this expected level of selectivity. If units are expected to represent a number of input features, then a value of savg_cor.cor closer to 1 is more appropriate. We will revisit this issue.
Now, let's explore the contrast enhancement sigmoid function of the effective weights. The parameters wt_sig.gain and .off in the control panel control the gain and offset of the sigmoid function. First, we will plot the shape of the contrast enhancement function for different values of these parameters.
- First set wt_sig.gain to 6 instead of 1. Click the
Graph Wt Sig Funbutton in the control panel, keep the graph_data value in the popup dialog at NULL, which will bring up a new graph view in a new tab in the 3D view area.
You should see a sigmoidal function (the shape of the resulting effective weights function) plotted. The horizontal axis, Lin Wt, represents the raw linear weight value, and the vertical axis, Eff Wt, represents the contrast enhanced effective weight value. The increase in wt_sig.gain results in substantial contrast enhancement.
- Try setting wt_sig.gain to various different values, and then clicking the
Graph Wt Sig Funbutton (on subsequent runs, you can select LeabraConSpec_0_WtSigFun for the graph_data value and it will just update the existing graph instead of making a new one). This should give you a good sense of its effect on the shape of this function.
We next see the effects of wt_sig.gain on learning.
- First, hit the Defaults button in the .PanelTab.ControlPanel, then change savg_cor.cor to 1, set the input_data to ThreeLinesEnv, and Run.
This run provides a baseline for comparison. You should see a somewhat bloblike representation in the weights of the network, where the right lines are a bit more strong than the left lines, but not dramatically so.
- Now increase wt_sig.gain from 1 to 6, and Run again.
You should very clearly see that only the right lines are represented, and with relatively strong weights. Thus, the contrast enhancement allows the network to represent the reality of the distinct underlying left and right categories of features even when it is imperfectly selective (.7) to these features. This effect will be especially important for self-organizing learning, as we will see in the next project.
Now, let's use the wt_sig.off parameter to encourage the network to pay attention to only the strongest of correlations in the input.
- Leaving wt_sig.gain at 6, change wt_sig.off to 1.25, and do
Graph Wt Sig Funto see how this affects the effective weight function. You may have to go back and forth between 1 and 1.25 a couple of times to be able to see the difference -- it is more subtle than the gain parameter.
- With wt_sig.off set to 1.25, Run the network.
Question 4.4 (a) How does this change the results compared to the case where wt_sig.off is 1? (b) Explain why this occurs. (c) Find a value of wt_sig.off that makes the non-central (non-overlapping) units of the right lines (i.e., the 4 units in the lower left corner and the 4 units in the upper right corner) have weights around .1 or less. (d) Do the resulting weights accurately reflect the correlations present in any single input pattern? Explain your answer. (e) Can you imagine why this representation might be useful in some cases?
An alternative way to accomplish some of the effects of the wt_sig.off parameter is to set the savg_cor.cor parameter to a value of less than 1. As described above, this will make the units more selective because weak correlations will not be renormalized to as high a weight value.
- Set wt_sig.off back to 1, and set savg_cor.cor to .7.
Question 4.5 (a) What effect does this have on the learned weight values? (b) How does this compare with the wt_sig.off parameter you found in the previous question?
This last question shows that because the contrast enhancement from wt_sig.gain magnifies differences around .5 (with wt_sig.off=1), the savg_cor.cor can have a big effect by changing the amount of correlated activity necessary to achieve this .5 value. A lower savg_cor.cor will result in smaller weight values for more weakly correlated inputs -- when the wt_sig.gain parameter is large, then these smaller values get pushed down toward zero, causing the unit to essentially ignore these inputs. Thus, these interactions between contrast enhancement and renormalization can play an important role in determining what the unit tends to detect.
These simulations demonstrate how the correction factors of renormalization and contrast enhancement can increase the effectiveness of the CPCA algorithm. These correction factors represent quantitative adjustments to the CPCA algorithm to address its limitations of dynamic range and selectivity, while preserving the basic computation performed by the algorithm to stay true to its biological and computational motivations.
