has changed. Algorithm 1 shows the pseudocode of
RPROP+. If the sign of the partial derivative has
not changed in the last iteration, the step size ∆
i j
of
weight w
i j
increases. The step size is limited by the
maximum value ∆
max
. If the sign of the gradient has
changed in the last iteration, step size ∆
(t)
i j
is decreased
(again limited by the minimum value ∆
min
) and the
last weight update is reverted. Last, the current gra-
dient is reset to 0 to enforce the last condition (Line
12), which conducts a weight update with the new (re-
duced) step size.
Figure 1 illustrates with working principle of
RPROP+. An increase of the step size in case the sign
of the partial derivative has not changed is reasonable
5 Backpropagation-Varianten
der Gewichte berechnet. Durch die Multiplikation der Schrittweite in 5.5 mit dem um-
gekehrten Vorzeichen ≠sign(
ˆE
ˆw
ij
) des Gradienten werden die Gewichte w
ij
immer in
Richtung des Gefälles der Fehlerfunktion verändert.
Christian Igel und Michael Hüsken haben 2003 in [8] vier unterschiedliche Varianten
aus dem Algorithmus von Riedmiller und Braun herausgestellt. Zwei dieser Varianten,
„Resilien Propagation with weight-backtracking“ (RPROP+) und „improved Resilient
Propagation with Backtracking“ (iRPROP+) werden in den nächsten zwei Abschnit-
ten vorgestellt. Sie unterscheiden sich lediglich in der Hinsicht, ob der letzte Schritt
rückgängig gemacht wird, wenn sich das Vorzeichen des Gradienten verändert hat.
w
ij
E(w
ij
)
Abbildung 5.1: Die Schrittweite wird bei RPROP jedes Mal erhöht, wenn sich das Vorzeichen
des Gradienten zum vorherigen Schritt nicht geändert hat. Ändert es sich, wird
der letzte Schritt rückgängig gemacht und die Schrittweite für den nächsten
Schritt verkleinert. So wird verhindert, dass die Gewichte um das Minimum
oszillieren.
5.3.1 RPROP+
Die erste von Igel und Hüsken beschriebene Variante von RPROP ist „RPROP with
backtracking“ (RPROP+). Hierbei wird der letzte Schritt rückgängig gemacht, wenn sich
das Vorzeichen des letzten und das des aktuellen Gradienten voneinander unterscheiden.
In Abb. 5.2 ist der Ablauf des RPROP+-Algorithmus zu sehen.
30
Figure 1: Illustration of gradient descent with RPROP+.
to accelerate the walk into the direction of the opti-
mum (left two solid arrows). In case the optimum
is missed and the sign of the partial derivates have
changed (dotted arrow), the following gradient de-
scent step is performed from the previous position
with a decreased step size (w
t+4
).
3.4 iRPROP+
A further method we test is the improved resilient
propagation with backtracking (iRPROP+) (Igel and
H
¨
usken, 2003), which is an extension of RPROP+.
The difference to RPROP+ is that the weight update
is only reverted, if it led to an increased error, i.e., if
E
(t)
> E
(t−1)
. In pseudocode 1, Line 9 must be re-
placed by
IF E
(t)
> E
(t−1)
THEN w
(t+1)
i j
:= w
(t)
i j
− ∆w
(t)
i j
The variants are experimentally compared in the next
section.
4 EXPERIMENTAL ANALYSIS
In this section, we compare standard backpropaga-
tion, backpropagation with momentum, RPROP+,
and iRPROP+ experimentally. For this sake, the
four methods are run for 2000 iterations on test data
sets in turbines from Casper, Las Vegas, Reno, and
Tehachapi for a prediction horizon of λ = 3 steps (30
minutes). We use each 5th pattern of the wind time
series data of year 2004. The resulting data set con-
sists of 10512 patterns, of which 85% are used for
training and 15% are randomly drawn for the valida-
tion set. Each training process is repeated three times.
The topologies of the neural networks depend on the
number of employed neighboring turbines, which de-
termine the dimensionality of patterns x
i
:
• Casper: 33 input neurons (10 neighboring tur-
bines, 1 target turbine, 3 time steps), 34 hidden
neurons
• Cheyenne: 33 input neurons (10 neighboring tur-
bines, 1 target turbine, 3 time steps), 34 hidden
neurons
• Las Vegas: 30 input neurons (9 neighboring tur-
bines, 1 target turbine, 3 time steps), 31 hidden
neurons
• Reno: 30 input neurons (9 neighboring turbines,
1 target turbine, 3 time steps), 31 hidden neurons
• Tehachapi: 21 input neurons (6 neighboring tur-
bines, 1 target turbine, 3 time steps), 22 hidden
neurons
For the classical backpropagation variants, the fol-
lowing parameters are chosen: ρ = 3 · 10
−7
and
α = 1 · 10
−8
for BPMom. For RPROP, the fol-
lowing parameters are chosen: ∆
min
= 1 · 10
−6
,
∆
max
= 50, η
−
= 0.5, and η
+
= 1.2. Table 1 shows
the experimental results. The figures show the vali-
dation error in terms of MSE. RPROP+ and iRPROP
clearly outperform the two classical backpropagation
variants.
In the following, we analyze and compare the
learning curves of BP and RPROP. Figure 2 shows
the validation error development in terms of MSE in
the course of backpropagation and iRPROP+ train-
ing for the Tehachapi data sets. The plots show that
RPROP+ achieves a significantly faster training error
reduction than backpropagation. A closer look at the
learning curves (in terms of validation error) offers
Figure 3. Each three runs of backpropagation show a
smooth approximately linear development. iRPROP+
based training reduces the errors faster, but also suf-
fers from slight deteriorations during the learning pro-
cess. However, the situation changes at later stages of
ResilientPropagationforMultivariateWindPowerPrediction
335