Next: New Levenberg-Marquardt Training Algorithms Up: New Random Neural Network Previous: Introduction Contents Index

Gradient Descent Training Algorithm for RNN

Gelenbe's RNN training algorithm [47] proceeds as follows. Recall that $w^+_{ij}$ and $w^-_{ij}$ represent the rates at which neuron

sends respectively excitation or inhibition spikes to neuron

. Let

denote the total number of neurons in the network. The adjustable network parameters are the two matrices $\mathbf{W}^+=\{w^+_{ij}\}$ and $\mathbf{W}^-=\{w^-_{ij}\}$ , both of which have $n \times n$ elements¹⁰¹. These parameters should be determined by the training algorithm from the training examples (a set of

input-output pairs). The set of inputs is denoted by $\mathbf{X}=(x^{(1)}, \ldots, x^{(K)})$ . Each of the inputs $x^{(k)}$ , $k\in[1\ldots,K]$ , consists of a set of excitation-inhibition pairs $x^{(k)}=(\lambda^{+(k)},\lambda^{-(k)})$ representing the signal flow entering each neuron from outside the network. Thus, for each set of inputs, we have $\lambda^{+(k)}= (\lambda_1^{+(k)}, \ldots, \lambda_n^{+(k)})$ and $\lambda^{-(k)}= (\lambda_1^{-(k)}, \ldots, \lambda_n^{-(k)})$ . The set of outputs is denoted by $\mathbf{Y}=(y^{(1)}, \ldots, y^{(K)})$ , where $y^{(k)}=(y_1^{(k)}, \ldots, y_n^{(k)})$ ; the elements $y_i^{(k)} \in [0,1]$ represent the desired output of each neuron. The tranining technique must adjust the parameters $\mathbf{W}^{+/-}$ in order to minimize a cost function $E^{(k)}$ , given by

$\begin{displaymath}E^{(k)} =\textstyle \frac{1}{2} \displaystyle \sum_{i=1}^n a_i (\varrho_i^{(k)}-y_i^{(k)})^2, \quad a_i \geq 0, \end{displaymath}$

(101)

where $\varrho_i^{(k)}$ and $y_i^{(k)}$ are the neuron's

actual and desired output for the input-output pair $k\in\{1,\ldots,K\}$ respectively. The constant

is set to zero for the neurons that are not connected to the outputs of the network. At each successive input-output pair

, the adjustable network parameters $\mathbf{W}^{+(k)}=\{w^{+(k)}_{i,j}\}$ and $\mathbf{W}^{-(k)}=\{w^{-(k)}_{i,j}\}$ , where $i,j = 1, \ldots, n$ need to be updated. The algorithm used by Gelenbe is the gradient descent one. Observe that the values in the matrices $\mathbf{W}^+$ and $\mathbf{W}^-$ must not be negative (since their elements are transition rates). The rule for the weights update is as follows:

$\begin{displaymath} \begin{array}{c} \displaystyle w^{+(k)}_{u,v}=w^{+(k-1)... ...partial \varrho_i / \partial w^-_{u,v}]^{(k)}. \end{array} \end{displaymath}$

(102)

Here, $\eta >0$ (it is called the learning rate), $\{u,v\} = 1, \ldots, n$ and $\varrho_i$ is the output of neuron

calculated from the $k^{th}$ input and from the equations by setting $w^*_{u,v}=w^{*(k-1)}_{u,v}$ , where

can be either

. Recall that

$\begin{displaymath} \varrho_i =\frac{T^+_i}{r_i + T^-_i}, \end{displaymath}$

(103)

where

$\begin{displaymath} T^+_i =\lambda^+_i+\sum_{j=1}^n \varrho_j w^+_{j,i}, \quad T^-_i=\lambda^-_i+\sum_{j=1}^n \varrho_j w^-_{j,i}, \end{displaymath}$

(104)

and

$\begin{displaymath}r_i=\sum_{j=1}^n [w^+_{j,i}+w^-_{j,i}]. \end{displaymath}$

(105)

Using Eqn. 10.3 to compute $[\partial \varrho_i /\partial w^*_{u,v}]^{(k)}$ , we have:

$\begin{displaymath} \partial \varrho_i/\partial w^+_{u,v} = \sum_j \left[\par... ...] /D_i\right] -1[u\equiv i] \varrho_i /D_i + \varrho_u /D_i, \end{displaymath}$

(106)

$\begin{displaymath} \partial \varrho_i/\partial w^-_{u,v} = \sum_j \left[ \pa... ...ht] -1[u\equiv i] \varrho_i /D_i - \varrho_u \varrho_i /D_i, \end{displaymath}$

(107)

where $D_i =r_i +\lambda^-_i +\sum_j \varrho_j w_{j,i}$ and $1[x]= \left \{ \begin{array}{ll} 1 & \mbox{if $x$\ is true} \\ 0 & \mbox{otherwise} \end{array}. \right.$ Let $\mathbf{\varrho}=(\varrho_1,\ldots,\varrho_n)$ , and define the $n \times n$ matrix:

$\begin{displaymath}\mathbf{W}=\{[w^+_{i,j}-w^-_{i,j}\varrho_j]/D_j\} \quad i,j=1,\ldots,n.\end{displaymath}$

We can now write

$\begin{displaymath} \begin{array}{c} \partial \mathbf{\varrho}/ \partial w^... ..._{u,v} \mathbf{W} + \gamma^-_{u,v}\varrho_u, \end{array} \end{displaymath}$

(108)

where the elements of the

-vectors $\gamma^+_{u,v}= [\gamma^{+(1)}_{u,v}, \ldots,\gamma^{+(n)}_{u,v}]$ , $\gamma^-_{u,v}= [\gamma^{-(1)}_{u,v}, \ldots,\gamma^{-(n)}_{u,v}]$ are given by:

$\begin{displaymath} \gamma^{+(n)} _{u,v}=\left\{ \begin{array}{ll} -1/D_i & \... ...,\quad v= i$},\\ 0 & \mbox{otherwise}; \end{array} \right. \end{displaymath}$

$\begin{displaymath} \gamma^{-(n)} _{u,v}=\left\{ \begin{array}{ll} -(1+\varrh... ... \quad v=i$}, \\ 0 & \mbox{otherwise}. \end{array} \right. \end{displaymath}$

It can be observed that Eqn. 10.8 can be rewritten as follows:

$\begin{displaymath} \begin{array}{l} \partial \mathbf{\varrho} /\partial w^... ...\gamma^- _{u,v} \varrho_u [\mathbf{I-W}]^{-1}, \end{array} \end{displaymath}$

(109)

where $\mathbf{I}$ is the $n \times n$ identity matrix. The training algorithm can be started by first initializing the two matrices $\mathbf{W}^{+(0)}$ and $\mathbf{W}^{-(0)}$ randomly. Then, we have to choose the value of $\eta$ . For each successive value

, starting from

, one must proceed as follows:

Set the input values to $(\lambda^{+(k)},\lambda^{-(k)})$ .
Solve the system of nonlinear equations 10.3 and 10.4.
Solve the system of linear equations 10.9 with the results of step 2.
Using Eqn. 10.2 and the results of Eqn. 10.3 and Eqn. 10.4, update the matrices $\mathbf{W}^{+(k)}$ and $\mathbf{W}^{-(k)}$ . It should be noted that the values in the two matrices $\mathbf{W}^{+(k)}$ and $\mathbf{W}^{-(k)}$ should not be negative, as stated before. Thus, in any step of the algorithm, if the iteration yields a negative value of a term, we consider two possibilities:

(a)

Set the negative value to zero and stop the iteration for this term at this step . In the next step , iterate on this term with the same rule starting from its current zero value.

(b)

Go back to the previous value of the term and iterate with a smaller value of $\eta$ .

After the training phase, the network can be used in normal operation (the only needed computations to obtain the outputs are those given by equations 10.3 and 10.4).

Next: New Levenberg-Marquardt Training Algorithms Up: New Random Neural Network Previous: Introduction Contents Index

Samir Mohamed 2003-01-08