Constraint Satisfaction

From emergent
Revision as of 15:28, 29 April 2011 by Administrator (Talk | contribs) (Text replace - "emergent" to "{{emergent}}")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

Constraint satisfaction is an emergent computational property of neural networks that have recurrent connectivity, where activation states are mutually influenced by each other, and settling over time leads to states that satisfy the constraints built into the weights of the network, and those that impinge through external inputs.

Constraint satisfaction can solve complicated computational problems where the interdependencies among different possible solutions and high-dimensional state spaces make searching or other techniques computationally intractable. The extent to which a network is satisfying its constraints can be measured by a global energy or "goodness" function. Proofs regarding the stability of equilibrium states of these networks and derivations of learning rules have been made based on these energy functions.

In the software, a collection of constraint satisfaction style algorithms have been implemented under a common framework. These algorithms include the binary and continuous Hopfield style networks (Hopfield, 1982, 1984), the closely related Boltzmann Machine networks (Ackley, Hinton and Sejnowski, 1985), the interactive activation and competition (IAC) algorithm (McClelland and Rumelhart, 1981), and GRAIN networks (Movellan and McClelland, 1994).

In addition to recurrent activation propagation and settling over time, these algorithms feature the important role that noise can play in avoiding sub-optimal activation states. The work with the GRAIN algorithm extends the role of noise by showing that the network can learn to settle into different distributions of activation states in a probabilistic manner. Thus, one can teach a network to go to one state roughly 70 percent of the time, and another state roughly 30 percent of the time. These distributions of possible target states can be specified by using a probability environment, which is described in a subsequent section.

Also, learning takes place in these networks through a more local form of learning rule than error backpropagation. This learning rule, developed for the Boltzmann machine, has been shown to work in a wide variety of activation frameworks, including deterministic networks. This rule can be described as a "contrastive Hebbian learning" (CHL) function, since it involves the subtraction of two simple Hebbian terms computed when the network is in two different "phases" of settling.

The two phases of settling required for learning are known as the minus (negative) and plus (positive) phase. The minus phase is the state of the network when only inputs are presented to the network. The plus phase is the state of the network when both inputs and desired outputs are presented. The CHL function states that the weights should be updated in proportion to the difference of the coproduct of the activations in the plus and minus phases:

  cn->dwt = lrate * (ru->act_p * su->act_p - ru->act_m * su->act_m)

where ru is the receiving unit and su is the sending unit across the connection cn, and act_m is the activation in the minus phase, and act_p is the activation in the plus phase.

It turns out that in order to learn distributions of activation states, one needs to collect many samples of activation states in a stochastic network, and update the weights with the expected values of the coproducts of the activations, but the general idea is the same. This learning rule can be shown to be minimizing the cross-entropy between the distributions of the activations in the minus and plus phases, which is the basis of the Boltzmann machine derivation of the learning rule.

The emergent implementation allows you to perform learning in both the stochastic mode, and with deterministic networks using the same basic code. Also, there is support for annealing and sharpening schedules, which adapt the noise and gain parameters (respectively) over the settling trajectory. Using these schedules can result in better avoidance of sub-optimal activation states.

Reference Implementation Details

There are Cs specific versions of all the standard network classes: