has changed. Algorithm 1 shows the pseudocode of
RPROP+. If the sign of the partial derivative has
not changed in the last iteration, the step size ∆
i j
weight w
i j
increases. The step size is limited by the
maximum value ∆
. If the sign of the gradient has
changed in the last iteration, step size ∆
i j
is decreased
(again limited by the minimum value ∆
) and the
last weight update is reverted. Last, the current gra-
dient is reset to 0 to enforce the last condition (Line
12), which conducts a weight update with the new (re-
duced) step size.
Figure 1 illustrates with working principle of
RPROP+. An increase of the step size in case the sign
of the partial derivative has not changed is reasonable
Figure 1: Illustration of gradient descent with RPROP+.
to accelerate the walk into the direction of the opti-
mum (left two solid arrows). In case the optimum
is missed and the sign of the partial derivates have
changed (dotted arrow), the following gradient de-
scent step is performed from the previous position
with a decreased step size (w
3.4 iRPROP+
A further method we test is the improved resilient
propagation with backtracking (iRPROP+) (Igel and
usken, 2003), which is an extension of RPROP+.
The difference to RPROP+ is that the weight update
is only reverted, if it led to an increased error, i.e., if
> E
. In pseudocode 1, Line 9 must be re-
placed by
> E
i j
:= w
i j
− ∆w
i j
The variants are experimentally compared in the next
In this section, we compare standard backpropaga-
tion, backpropagation with momentum, RPROP+,
and iRPROP+ experimentally. For this sake, the
four methods are run for 2000 iterations on test data
sets in turbines from Casper, Las Vegas, Reno, and
Tehachapi for a prediction horizon of λ = 3 steps (30
minutes). We use each 5th pattern of the wind time
series data of year 2004. The resulting data set con-
sists of 10512 patterns, of which 85% are used for
training and 15% are randomly drawn for the valida-
tion set. Each training process is repeated three times.
The topologies of the neural networks depend on the
number of employed neighboring turbines, which de-
termine the dimensionality of patterns x
• Casper: 33 input neurons (10 neighboring tur-
bines, 1 target turbine, 3 time steps), 34 hidden
• Cheyenne: 33 input neurons (10 neighboring tur-
bines, 1 target turbine, 3 time steps), 34 hidden
• Las Vegas: 30 input neurons (9 neighboring tur-
bines, 1 target turbine, 3 time steps), 31 hidden
• Reno: 30 input neurons (9 neighboring turbines,
1 target turbine, 3 time steps), 31 hidden neurons
• Tehachapi: 21 input neurons (6 neighboring tur-
bines, 1 target turbine, 3 time steps), 22 hidden
For the classical backpropagation variants, the fol-
lowing parameters are chosen: ρ = 3 · 10
α = 1 · 10
for BPMom. For RPROP, the fol-
lowing parameters are chosen: ∆
= 1 · 10
= 50, η
= 0.5, and η
= 1.2. Table 1 shows
the experimental results. The figures show the vali-
dation error in terms of MSE. RPROP+ and iRPROP
clearly outperform the two classical backpropagation
In the following, we analyze and compare the
learning curves of BP and RPROP. Figure 2 shows
the validation error development in terms of MSE in
the course of backpropagation and iRPROP+ train-
ing for the Tehachapi data sets. The plots show that
RPROP+ achieves a significantly faster training error
reduction than backpropagation. A closer look at the
learning curves (in terms of validation error) offers
Figure 3. Each three runs of backpropagation show a
smooth approximately linear development. iRPROP+
based training reduces the errors faster, but also suf-
fers from slight deteriorations during the learning pro-
cess. However, the situation changes at later stages of