of input variable: price indices x
M
, direct discount x
d
and baseline BL. Regarding the price indices variable
they are defined as PI(i, t) =
P
prom
(i,t)
P
reg
(i,t)
, where P
reg
(i, t)
and P
prom
(i, t) are the regular and promotional price,
respectively, of product i at week t. Hence, the price
index gives the relative variation between the promo-
tional price and the regular price, and its value is 1
whenever both are equal.
The input pattern was given by concatenation of
the three kind of input variables, x = [x
M
, x
D
, BL].
The output of each model is given by the sold
units for that particular product. Hence, the
promotional model can be expressed as y(i, t) =
f(x
M
(i, t), x
D
(i, t), BL(t)), where y(i, t) is the num-
ber of sold units for the i − th product during week
t; x
M
(i, t) = [PI
1
(i, t), ..., PI
n
m
(i, t)] is a vector with
the price indices of product i during week t, with n
m
=
6; x
D
(i, t) is the direct discount dichotomous variable
for product i during week t; and BL(t) is the baseline
variable at week t. The input metric variables were
the same for all the models.
We considered the decision of two different de-
sign criteria for MLP-based promotional models. On
the one hand, the MLP for estimation problems can
give a multiple output, which in principle could ben-
efit from the consideration of joint cross-information
among models. However, there is no warranty that
a multiple output architecture will work better than a
separate MLP single model for each product. On the
other hand, another design criterion is the use of dif-
ferent activation functions in the hidden layer nodes,
being two widely used forms the linear and the sig-
moid logistic activation. Given that there is no theo-
retical result, this function has to be chosen for each
data mining model.
For this purpose, free parameters were tuned in
the MLP for the single output set of models using
LOO. Then paired bootstrap test was used to check
which architecture can be pointed as more convenient,
in terms of the previously used merit figures. Table 1
shows the comparison of the number of neurons in the
hidden layer (n
0
) for both architectures. Note that n
0
has a relevant variation in terms of different products
Table 1: Free parameter tuning in terms of n
0
for MLP with
multiple and with single output, for Milk Data Base.
n
o
(MLP) n
o
(MLP_ind)
Model 1 (Asturiana) 17 15
Model 2 (Ato) 17 8
Model 3 (House brand) 17 1
Model 4 (Pascual Calcio) 17 6
Model 5 (Pascual Clasica) 17 8
Model 6 (Puleva Calcio) 17 14
Table 2: Single vs multiple output MLP for Milk products,
using MAE merit figure. See text for details.
MLP
ind
MLP MLP
ind
vs MLP
Model 1 357.3 || 359.1 320.7 || 321.6 66.4 [-193.9,577.1]
[266.2,463.2] [248.5,401.5] 37.0 [-50.8,130.3]
65.4 [-192.1,569.9]
Model 2 222.6 || 222.3 199.6 || 200.4 26.2 [-114.1,192.1]
[180.8,267.2] [157.1,247.9] 28.6 [-17.9,72.9]
50.2 [-69.4,217.6]
Model 3 135.8 || 136.3 152.5 || 152.5 -66.9 [-144.6,49.3]
[105.5,167.3] [119.2,188.6] -14.4 [-36.0,7.8]
-57.5 [-124.6,62.0]
Model 4 59.6 || 59.7 72.3 || 72.4 -7.8 [-49.2,18.7]
[44.2,77.4] [58.5,88.3] -12.8 [-27.2,1.4]
-13.4 [-54.5,7.0]
Model 5 305.2 || 310.8 198.7 || 197.8 258.8 [63.9,528.1]
[227.9,397.5] [148.5,250.7] 103.3 [21.0,186.2]
267.3 [74.9,544.1]
Model 6 226.7 || 226.7 125.6 || 125.0 304.5 [98.2,447.1]
[167.3,293.2] [97.8,155.3] 100.7 [37.8,163.1]
304.5 [108.6,473.2]
with single output, and also, that n
0
is sensibly larger
for multiple output MLP architecture.
Table 2 shows the MAE and the comparison
among both schemes, with the sigmoid activation
function, for all the products in the data base. Indi-
vidual models for each product with single output is
denoted as MLP
ind
, whereas multiple output architec-
ture is denoted as MLP. Each cell in the second and
third columns contains the empirically estimated ac-
tual risk (i.e., averaged from LOO estimation of MAE
for each case), together with the bootstrap estimate
of the averaged MAE, namely, the mean (upper line,
right), and the 95% CI of this sample mean. The ap-
parently best model of both, in terms of empirical
LOO-MAE, is highlighted in bold. The comparison
between both models is represented in the last col-
umn, showing the average and the 95% CI for ∆MAE,
∆CI, and ∆CI
sup
, in the first, second, and third line
of the cell, respectively. In this column, bold is used
for highlighting the CI which yield significant differ-
ences with respect to the paired bootstrap test, i.e.,
those statistics for the differential merit figure whose
estimated difference does not overlap the zero level.
It can be observed that, for Models 3 and 4, the
performance is apparently better when using individ-
ual architectures, whereas Models 1, 2, 5, and 6 are
better when considering the joint architecture. How-
ever, only significant differences are present in Mod-
els 5 and 6, both in terms of averaged and scatter
MAE, hence the most advantageous situation is to use
a multiple output architecture. No significant differ-
ences are sustained by the paired bootstrap test for
Models 1, 2, 3 and 4. In general terms, we can con-
clude that, for this Milk Data Base, it is better to con-
DEAL EFFECT CURVE AND PROMOTIONAL MODELS - Using Machine Learning and Bootstrap Resampling Test
539