Network

From Emergent

Jump to: navigation, search

Contents

Overview

The network is the main object for a neural network simulation, containing Layers of Units which are connected through Projections. It also contains all the Specs used by these objects, which specify the parameters and equations used.

In addition to encapsulating and containing the structure of the network, the Network object also encapsulates all the functions of a network (note this is different from PDP++, where the Process objects did a lot of that). Thus, every form of processing that the network can support should have a corresponding function defined on the Network object. This includes commonly used statistics and other types of processing that are not strictly the network algorithm per se, but are so commonly used that it makes sense to support them directly on the network.

Weight File Format

Emergent uses an XML-style weight file format, which produces much more robust loading of weight files after the network structure has changed, compared to the PDP++ format.

How to Save and Load Weights

From the Tree Viewer, click on Networks, then your network. Its associated properties should appear in the Panel View on a pinkish background. There are three menus immediately above them: Object, SelectEdit, and Actions. The Object menu has Load Weights and Save Weights commands near the bottom. Alternatively, you can run the SaveWeights program from the LeabraAll_Std subgroup in the Programs group in the Tree View.

Note: Layers are looked up by name and thus if you change the layer name, a weight file will no longer load correctly. Because these files are text files, you can just look for that layer name in the file and change it there as well. Or, better yet, load the weights with the old layer name, then change the name, then save the weight file again.

Example of an Emergent Weight File

Below is a schematic of the initial weights from a 4-layer feed forward back-propagation network. The input layer has 4 units, the first hidden layer has three units, the second hidden layer has 2 units and the output layer has 1 unit. The units are numbered in bold blue text. There is a bias weight value for each unit, shown in the box to the left of the unit number, and there is a weight associated with each connection shown in a list to the left of the units. For example, to the left of unit hidden_0_0, there is a bulleted list of four numbers labeled with their array positions 0-3. The first value in this list "-0.2027" is the weight for the connection between input unit 0 (input_0) and hidden unit 0_0 (hidden_0_0). The second value in the list "0.0484" is the weight for the connection between input_1 and hidden_0_0.


Figure 1: Graphic representation of information stored in an Emergent Weight File

Image:Wt_rnd_4-3-2-1_0000.wts.jpg

  • Note: The network weights and bias weight values in Emergent are single precision real numbers, for the sake of space, the values have been rounded to 4 decimal places for display in this section of documentation.


Below in figure 2 is a section of the actual weights file from which the information in the schematic was taken. The network weight and bias weight entries for layer hidden_0, unit 0 (hidden_0_0) are shown here. The data from the weights file is shown on the left in bold black text. Annotations have been provided to label the xlm tags.


Figure 2: Weight and Activation data for unit Hidden_0_0


Image:Wt_rnd_4-3-2-1_0000_hidden_0_0.wts.jpg


The section begins with the notation for the layer, which in this case is Hidden_0. The entry for unit hidden_0_0 begins with the "<UgUn 0 >" xml tag noting the unit indexed in array position "0". The first entry under <UgUn 0 > gives the bias weight value for unit hidden_0_0 and is labeled with the xml tag <Un>. In this case the bias weight value for hidden_0_0 is "-0.3014". The bias weight is followed by the weights for the connections between the layer "Input" and unit "hidden_0_0". The notation for this is <Cg 0 Fm:Input> meaning "connections to 'Unit_0' from layer 'Input'". The next tag, <Cn 4>, indicates that there are 4 connections between the layer "Input" and unit "hidden_0_0", one for each unit in the layer "Input". The <Cn> tag has 4 elements labeled 0,1,2,3. The value "-0.2027" is given next to the label "0" this indicates that "-0.2027" is the weight for the connection between input_0 and hidden_0_0. The value "0.0484" is given next to the label "1" this indicates that "0.0484" is the weight for the connection between input_1 and hidden_0_0. The "unit group/unit" block concludes with the xml stop tags.


The entire Weight File, from which the information in graphics above was extracted, is available by following the link below. The data for unit hidden_0_0, given in figure 2, begins on the 28th line of the weights file and starts with the XML tag <Lay Hidden_0>.

Example of Emergent Weight File

Distributed Memory Computation in the Network

The Network supports parallel processing across connections, where different distributed memory (dmem) processes compute different subsets of connections, and then share their results. For example if 4 processors were working on a single network, each would have connections for approximately 1/4 of the units in the network. When the net input to the network is computed, each process computes this on its subset of connections, and then shares the results with all the other processes.

Given the relatively large amount of communication required for synchronizing net inputs and other variables at each cycle of network computation, this is efficient only for relatively large networks (e.g., above 250 units per layer for 4 layers). In benchmarks on a Pentium 4 Xeon cluster system connected with a fast Myrinet fiber-optic switched network connection, networks of 500 units per layer for 4 layers achieved better than 2x speedup by splitting across 2 processors, presumably by making the split network fit within processor cache whereas the entire one did not. This did not scale that well for more than 2 processors, suggesting that cache is the biggest factor for this form of dmem processing.

In all dmem cases, each processor maintains its own copy of the entire simulation project, and each performs largely the exact same set of functions to remain identical throughout the computation process. Processing only diverges at carefully controlled points, and the results of this divergent processing are then shared across all processors so they can re-synchronize with each other. Therfore, 99.99% of the code runs exactly the same under dmem as it does under a single-process, making the code extensions required to support this form of parallel processing minimal.

The main parameter for controlling dmem processing is the dmem_nprocs field, which determines how many of the available processors are allocated to processing network connections. Other processors left over after the network allocation are allocated to processing event-wise distributed memory computation see next secton for information on this). The other parameter is dmem_sync_level, which is set automatically by most algorithms based on the type of synchronization that they require (feedforward networks generally require layer-level synchronization, while recurrent, interactive networks require network-level synchronization).

Distributed Memory Computation Across Trials

The Epoch Program supports distributed memory (dmem) computation by farming out trials of processing individual input patterns across different distributed memory processors. For example, if you had 4 such processors available, and an input data table of 16 events, each processor could process 4 of these events, resulting in a theoretical speedup of 4x. This will happen automatically if you start a dmem simulation with more than Network.dmem_nprocs processors -- see below for details.

In all dmem cases (see previous section for Network-level dmem) each processor maintains its own copy of the entire simulation project, and each performs largely the exact same set of functions to remain identical throughout the computation process. Processing only diverges at carefully controlled points, and the results of this divergent processing are then shared across all processors so they can re-synchronize with each other. Therfore, 99.99% of the code runs exactly the same under dmem as it does under a single-process, making the code extensions required to support this form of dmem minimal.

If learning is taking place, the weight changes produced by each of these different sets of events must be integrated back together. This is means that weights must be updated in SMALL_BATCH or BATCH mode when using dmem (this parameter is set on the Network object).

Trial-level distributed memory computation can be combined with network-wise dmem. The Network level dmem_nprocs parameter determines how many of the available processors are allocated to the network. If there are multiples of these numbers of processors left over, they are allocated to the Trial-level dmem computation. For example, if there were 8 processors available, and each network was allocated 2 processors, then there would be 4 sets of networks available for dmem processing of trials. Groups of two processors representing a complete network would work together on a given set of events.

If Network.wt_update is set to BATCH, then weights are synchronized across processors at the end of each epoch. Results should be identical to those produced by running on a single-processor system under BATCH mode.

If Network.wt_update is SMALL_BATCH, then the small_batch_n parameter is divided by the number of dmem processors at work to determine how frequently to share weight changes among processors. If small_batch_n is an even multiple of the number of dmem processors processing events, then results will be identical to those obtained on a single processor. Otherwise, the effective batch_n value will be different. For example, if there are 4 dmem processors, then a value of batch_n = 4 means that weights changes are applied after each processor processes one event. However, batch_n = 6 cannot be processed in this way: changes will occur as though batch_n = 4. Similarly, batch_n = 1 actually means batch_n = 4. If batch_n = 8, then weight changes are applied after every 2 sets of dmem event processing steps, etc.

Note that wt_update cannot be ONLINE in dmem mode, and will be set to SMALL_BATCH automatically by default.

Note that the event-wise model may not be that sensible under dmem if there is any state information carried between events in a sequence (e.g., a SRN context layer or any other form of active memory), as is often the case when using sequences, because this state information is NOT shared between processes within a sequence (it cannot be -- events are processed in parallel, not in sequence).

Reference Information

Personal tools