Analytical Formulation of the Algorithm

Next: Different Variants of the Up: New Levenberg-Marquardt Training Algorithms Previous: New Levenberg-Marquardt Training Algorithms Contents Index

Analytical Formulation of the Algorithm

Following Gelenbe's notations, let us write

$\begin{displaymath} N_i=\lambda^+_i+\sum_j \varrho_jw^+_{i,j}, \end{displaymath}$

(1010)

$\begin{displaymath} D_i=r_i+\lambda^-_i+\sum_j \varrho_jw^-_{i,j}, \end{displaymath}$

(1011)

$\begin{displaymath} \varrho_i=N_i/D_i, \end{displaymath}$

(1012)

where $i\in 1,\ldots, n$ represents the neuron index,

the weights, $\lambda^+_i, \lambda^-_i$ the excitation and inhibition external signals for neuron

, and $\varrho_i$ the output of the neuron

. Define

$\begin{displaymath} E=\sum_{k=1}^K E^{(k)}, \end{displaymath}$

(1013)

recalling that

$\begin{displaymath} E^{(k)}= \textstyle \frac{1}{2} \displaystyle \sum_{i=1}^n a_i(\varrho_i^{(k)}-y_i^{(k)})^2, \quad a_i \geq 0. \end{displaymath}$

(1014)

The mathematical formulation of LM method applied to RNN is as follows: $\bullet$

We define a generic vector , of elements, containing the adjustable parameters $w^+_{i,j}$ and $w^-_{i,j}$ ; $p=(p_1, \ldots, p_M)$ ; $p^{(k)}$ is the parameter vector at step of the training process.
We define also $g=(g_1, \ldots, g_M)$ the gradient vector, where $g_l=\partial E/ \partial p_l$ for $l=1, \ldots, M$ . Denote by $g^{(k)}$ the gradient vector at point $p^{(k)}$ .
The weights update based on Newton method is as follows:

$\begin{displaymath} p^{(k+1)}=p^{(k)} +s^{(k)}, \end{displaymath}$ (1015)

where $s^{(k)}$ is the Newton's direction obtained by solving the system

$\begin{displaymath} \mathbf{H}^{(k)}s^{(k)} = -g ^{(k)}, \end{displaymath}$ (1016)

$\mathbf{H}^{(k)}$ being the Hessian matrix at step . For LM, the Hesssian matrix $\mathbf{H}^{(k)}$ is approximated by:

$\begin{displaymath} \mathbf{H}^{(k)}=\mathbf{J}^{{(k)}^T}\mathbf{J}^{(k)}+\mu^{(k)}\mathbf{I}. \end{displaymath}$ (1017)

Here, $\mu^{(k)}>0$ and $J^{(k)}$ is the Jacobian matrix at step , given by

$\begin{displaymath} \mathbf{J}=\left[ \begin{array}{ccc} \frac{\partial e... ...frac{\partial e_N}{\partial p_M} \\ \end{array} \right], \end{displaymath}$ (1018)

where $e_i=y_i-\varrho_i$ is the prediction error of neuron , $\varrho_i$ is the output of neuron at the output layer, is the desired output of neuron at the output layer, and is the number of outputs multiplied by the number of training examples . Each element of matrix is computed using the following equations:

$\begin{displaymath} \mathbf{J}_{l,m}=\partial e_l / \partial p_m =- \partial ... ...ad\mbox{ where } l=1, \ldots, N \mbox{ and } m=1, \ldots, M, \end{displaymath}$ (1019)

$\begin{displaymath} \partial \mathbf{\varrho} / \partial p_m= \left\{ \begi... ...{-1} & \mbox{if} & p_m=w^-_{u,v},\\ \end{array}. \right. \end{displaymath}$ (1020)

From Eq. 10.16, we obtain:

$\begin{displaymath} s^{(k)}=-[\mathbf{H}^{(k)}]^{-1} g^{(k)}. \end{displaymath}$ (1021)

Equations 10.17 and 10.21 give:

$\begin{displaymath} s^{(k)}=-[\mathbf{J}^{{(k)}^T}\mathbf{J}^{(k)}+\mu^{(k)}\mathbf{I}]^{-1} g^{(k)}. \end{displaymath}$ (1022)

By grouping Equations 10.15 and 10.22 we obtain:

$\begin{displaymath} p^{(k+1)}=p^{(k)}-[\mathbf{J}^{{(k)}^T}\mathbf{J}^{(k)}+\mu^{(k)}\mathbf{I}]^{-1} g^{(k)}. \end{displaymath}$ (1023)

To compute vector , we use:

$\begin{displaymath} g=\mathbf{J}^T e, \end{displaymath}$ (1024)

where $e=(e_1, \ldots, e_N)$ .

Next: Different Variants of the Up: New Levenberg-Marquardt Training Algorithms Previous: New Levenberg-Marquardt Training Algorithms Contents Index

Samir Mohamed 2003-01-08