Predicting How Much a Consumer Is Willing to Pay for a Bottle of Wine:

Dealing With Data Imbalance

Hugo Alonso

1,2 a

and Teresa Candeias

1 b

Universidade Lus

ofona – Centro Universit

ario do Porto, Rua Augusto Rosa, n.º 24, 4000-098 Porto, Portugal

Universidade de Aveiro, Campus Universit

ario de Santiago, 3810-193 Aveiro, Portugal

Keywords:

Wine, Classiﬁcation, Data Imbalance, Re-Sampling, Learning Methods, Predictive Models.

Abstract:

The wine industry has becoming increasingly important worldwide and is one of the most signiﬁcant industries

in Portugal. In a previous paper, the problem of predicting how much a Portuguese consumer is willing to pay

for a bottle of wine was considered for the ﬁrst time ever. The problem was treated as a multi-class ordinal

classiﬁcation task. Although we achieved good prediction results, globally speaking, it was difﬁcult to identify

rare cases of consumers who are interested in paying for more expensive wines. We found that this was a direct

consequence of data imbalance. Therefore, here, we present a ﬁrst attempt to deal with this issue, based on

the use of re-sampling strategies to balance the training data, namely random under-sampling, random over-

sampling with replacement and the synthetic minority over-sampling technique. We consider several learning

methods and develop various predictive models. A comparative study is carried out and its results highlight

the importance of a careful choice of the re-sampling strategy and the learning method in order to get the best

possible prediction results.

1 INTRODUCTION

Wine market became more demanding with the grow-

ing number of new global players and a changing con-

sumer behavior. With the heterogeneity of wine mar-

kets, several studies suggested the use of segmenta-

tion methodologies to understand wine consumer be-

havior (Bruwer et al., 2002; Thach and Olsen, 2006;

Kolyesnikova et al., 2008; Koksal, 2021; Payini et al.,

2022). With thousands of wine brands, styles and re-

gions, consumers are frequently confused when pur-

chasing wine. According to (Rouzet and Seguin,

2004), in order to match wine consumers’ preferences

with wine characteristics, segmentation divides mar-

kets that can be reach with different marketing instru-

ments. Usually, the marketing segmentation variables

are geographic, demographic, psychographic and be-

havioral (Kotler and Keller, 2006). Segmentation

based on lifestyle has also been applied in the US, al-

though with the purpose to underline motivations and

occasions of consumption (Thach and Olsen, 2005).

Overall, an effective marketing strategy is required

and, in this context, understanding wine consumers’

https://orcid.org/0000-0002-1599-5392

https://orcid.org/0000-0002-3371-9869

needs and buying habits plays an important role in

market segmentation.

In a previous paper (Alonso and Candeias, 2022),

the problem of predicting how much a Portuguese

consumer is willing to pay for a bottle of wine was

considered for the ﬁrst time ever. More precisely,

given information about an individual, such as his/her

age and income, we were interested in predicting how

much he/she is willing to spend in a bottle: less than

EUR 2.99; between EUR 3 and 4.99; between EUR

5 and 9.99; EUR 10 or more. Since these intervals

can be viewed as ordered classes, the prediction prob-

lem was treated as a multi-class ordinal classiﬁcation

task. Using several types of predictive models and

learning methods, we achieved good results in terms

of the overall accuracy and r

int

(a measure of asso-

ciation between the ordinal variables true class and

predicted class) (Pinto da Costa et al., 2008; Pinto

da Costa et al., 2014). However, we found that all

classiﬁers had more difﬁculty in correctly predicting

cases from higher classes and that this was related to

our data imbalance. Note that, since most people are

willing to pay less and only a small number of peo-

ple are willing to pay more for a bottle of wine, lower

classes are much more frequent than higher ones. In

this context, identifying consumers who are willing to

Alonso, H. and Candeias, T.

Predicting How Much a Consumer Is Willing to Pay for a Bottle of Wine: Dealing With Data Imbalance.

DOI: 10.5220/0012068800003541

In Proceedings of the 12th International Conference on Data Science, Technology and Applications (DATA 2023), pages 263-270

ISBN: 978-989-758-664-4; ISSN: 2184-285X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

263

pay more for a bottle of wine corresponds to predict-

ing rare events. Here, we present our ﬁrst attempt to

deal with this issue. Our goal is to obtain more bal-

anced classiﬁers, i.e., with an improved ability to pre-

dict infrequent cases without seriously compromising

the prediction of frequent ones.

It is well known that the ability to predict rare

events remains one of the most challenging tasks to

solve in machine learning (Arafat et al., 2019). Ac-

cording to this reference and also to (Sun et al., 2009;

Haixiang et al., 2017; More and Rana, 2021), a com-

mon strategy to cope with this problem consists in

applying re-sampling methods to balance the train-

ing data. Another possibility consists, for instance,

in assigning different classiﬁcation costs to different

classes, but deciding those costs is a difﬁcult task and

incorporating them in some data mining algorithms

is not easy. In this paper, we apply three popular re-

sampling techniques: random under-sampling (RUS),

random over-sampling with replacement (ROSWR)

and the synthetic minority over-sampling technique

(SMOTE) (Ganganwar, 2012; Kotsiantis et al., 2006;

Chawla et al., 2002). RUS randomly removes exam-

ples from the most represented classes in the training

set, but, by doing so, it can discard potentially use-

ful data that could be important for the induction pro-

cess, thus leading to underﬁtting. In turn, ROSWR

and SMOTE randomly add examples to the least rep-

resented classes in the training set, though in dif-

ferent ways: while ROSWR repeats existing exam-

ples, SMOTE generates new artiﬁcial ones. The main

drawback of ROSWR is that it can increase the like-

lihood of occurring overﬁtting, because repeating ex-

amples makes them more important during the train-

ing phase. This problem is avoided by SMOTE.

Selecting proper evaluation metrics plays a key

role in the task of correctly handling data imbalance.

In (Branco et al., 2016), the authors survey several

metrics and discuss their advantages and disadvan-

tages. For a multi-class classiﬁcation problem, like

ours, they conclude that the so-called MF

score and

related measures are suitable for performance assess-

ment. Hence, we use them in this study.

In this work, we wish to compare our previous

results in (Alonso and Candeias, 2022) with news

ones we obtained by applying re-sampling strategies

to balance our data set. Therefore, the remainder of

the paper is organized as follows. The next section

describes the data we considered. The re-sampling

strategies are presented in Section 3 and the predic-

tive models and learning methods in Section 4. The

issue of how to assess performance in an imbalanced

problem is addressed in Section 5 and we deﬁne suit-

able metrics for that purpose. Finally, the results are

shown and compared in Section 6 and the conclusions

and future work are given in Section 7.

2 DATA

The data set considered in this study is the one we in-

troduced in our previous paper (Alonso and Candeias,

2022). It has a total of 228 instances and 9 attributes.

There are 8 predictive attributes or input variables,

nominal and ordinal, corresponding to consumers’

characteristics: gender, age, marital status, education

level, region of residence, income, wine knowledge

and consumption frequency. The target attribute or

output variable is ordinal and corresponds to the bottle

price class. The full data set is partitioned into train-

ing and test subsets, with 2/3 and 1/3 of all available

instances, respectively. The partitioning is stratiﬁed

and so the a priori class distribution is roughly the

same in the three sets. The classes are C

= ]0, 2.99[

euros, C

= [3, 4.99[ euros, C

= [5, 9.99[ euros and

= [10, +∞[ euros and their relative and absolute

frequencies in the three sets are given in Table 1. Re-

mark that the data are imbalanced: the distribution is

skewed to the right, with the two lower classes being

much more frequent than the two higher ones. The

reason is that most people are willing to pay less and

only a small number of people are willing to pay more

for a bottle of wine. Further details about the data can

be found in (Alonso and Candeias, 2022).

3 RE-SAMPLING STRATEGIES

Re-sampling strategies are used to balance an im-

balanced training set like ours. In the following,

we brieﬂy describe three popular techniques: ran-

dom under-sampling, random over-sampling with re-

placement and the synthetic minority over-sampling

technique (Ganganwar, 2012; Kotsiantis et al., 2006;

Chawla et al., 2002).

Random under-sampling consists in withdrawing

from the training set instances randomly chosen from

the most frequent classes, potentially until all classes

have the same number of cases or roughly the same.

In turn, random over-sampling with replacement

consists in adding to the training set instances ran-

domly chosen from the least frequent classes, poten-

tially until all classes have the same number of cases

or roughly the same. Note that, when an instance

is added to the augmented training set, it is always

drawn with replacement from the initial training set.

Finally, just like random over-sampling with re-

placement, the synthetic minority over-sampling tech-

DATA 2023 - 12th International Conference on Data Science, Technology and Applications

264

Table 1: Frequency distribution of the bottle price class variable in the full, training and test data sets.

Bottle price class

Total

Percentage of instances 38% 42% 16% 4% 100%

Full set 86 95 37 10 228

Number of instances Training set 57 63 25 7 152

Test set 29 32 12 3 76

nique adds instances to the least frequent classes.

However, the way how it does it is different, as will

be explained later on. Meanwhile, it should be said

that the authors of the method proposed three ver-

sions of it: SMOTE, for the case where all predictive

attributes are (quantitative) continuous; SMOTE-NC,

for the case where there is a mixture of nominal and

continuous predictive attributes; SMOTE-N, for the

case where all predictive attributes are nominal. Here,

since our data set has a mixture of nominal and ordi-

nal predictive attributes, we treat ordinal variables as

if they were nominal and consider the SMOTE-N ver-

sion, which we describe next, as well as the way how

we applied it.

In SMOTE-N, in order to add a new instance

to class C, we start by selecting an instance x in C

from the initial training set and determine its k nearest

neighbors x

,... ,x

in C from such set. The nearest

neighbors are computed using the modiﬁed version of

the Value Difference Metric (Stanﬁll and Waltz, 1986)

proposed by (Cost and Salzberg, 1993) and incorpo-

rating the suggestions in (Chawla et al., 2002). Then,

a new instance

x we add to the augmented training set

is given by a vector whose i-th component is the most

frequent value among the values of the i-th compo-

nents of x and k

′

of the k neighbors x

,. .. ,x

, where

1 ≤ k

′

≤ k. In this paper, we take k = 5. Remark that

the value of k must be lower than the number of in-

stances in the least represented class in our case, i.e.,

7 (see Table 1). Then, for each possible choice of x in

a class C for which we want to add instances, we gen-

erate as many new instances as possible and necessary

in the following way: the neighbors x

,. .. ,x

are split

into two sets with

⌊

k/2

⌋

and k −

⌊

k/2

⌋

elements; new

instances

and

are generated by combining x with

the neighbors in the ﬁrst and second sets, respectively;

the process of splitting and generation is repeated for

all possible splits.

4 PREDICTIVE MODELS AND

LEARNING METHODS

As was mentioned earlier, we wish to compare our

previous results in (Alonso and Candeias, 2022) with

the news ones we obtained by applying re-sampling

strategies to balance our data set. For this reason,

we consider the same predictive models and learning

methods.

Hence, we consider here three types of predictive

models: artiﬁcial neural networks, support vector ma-

chines and decisions trees (Hastie et al., 2009). Two

advantages of decision trees are their interpretability

and the ease with which they deal with qualitative pre-

dictive variables. Artiﬁcial neural networks and sup-

port vector machines are not as easily interpretable,

but very often they have better generalization results.

Details about these models are given in the previous

reference.

Regarding the learning methods, we consider the

conventional approach to supervised classiﬁcation,

where the order relation between the classes is not

taken into account (Hastie et al., 2009), and two ordi-

nal supervised classiﬁcation approaches, namely the

so-called unimodal binomial model (Pinto da Costa

et al., 2008) and a modiﬁcation of Frank and Hall’s

method (Frank and Hall, 2001), proposed in (Cardoso

and Pinto da Costa, 2007). These two ordinal learn-

ing methods and the way how they are applied to our

problem are brieﬂy described next.

4.1 The Unimodal Binomial Model

The unimodal model is a machine learning paradigm

intended for supervised classiﬁcation problems where

the classes are ordered. Introduced in (Pinto da Costa

et al., 2008), the main idea behind this model is that

the random variable class associated with a given

query should follow a unimodal distribution, so that

the order relation between the classes is respected.

In this context, the output of a classiﬁer where the a

posteriori class probabilities are estimated is obliged

to be unimodal, i.e., to have only one local maxi-

mum. There are different ways to impose unimodality

and, in (Pinto da Costa et al., 2008), the authors sug-

gested two approaches. In the parametric approach, a

unimodal discrete distribution, like the binomial and

Poisson’s, is assumed and its parameters are estimated

by the classiﬁer. In the non-parametric approach, no

distribution is assumed and the classiﬁer is trained

so that its output becomes unimodal. In all practical

Predicting How Much a Consumer Is Willing to Pay for a Bottle of Wine: Dealing With Data Imbalance

265

experiments conducted by the authors, the paramet-

ric approach led to better results, in particular when

the binomial distribution was considered. The supe-

rior performance achieved with this distribution was

also justiﬁed in theoretical terms. For these reasons,

our focus here is on the binomial model. Further-

more, since the classiﬁers chosen by us are artiﬁcial

neural networks, support vector machines and deci-

sions trees, we refer hereafter to binomial networks,

binomial support vector machines and binomial tress,

respectively. For the sake of conciseness, next, we

only present a detailed description of the binomial

networks applied to our problem.

As mentioned before, given information about a

consumer, we are interested in predicting how much

he/she is willing to spend in a bottle: less than EUR

2.99; between EUR 3 and 4.99; between EUR 5 and

9.99; EUR 10 or more. Representing these K = 4 bot-

tle price classes by C

,. .. ,C

, respectively, and the

information given about the consumer by x, Bayes’

decision theory (Hastie et al., 2009) suggests classi-

fying the case in the class maximizing the a poste-

riori probability P(C

x). To that end, the a pos-

teriori probabilities P(C

x), .. ., P(C

x) need to

be estimated. In the binomial network, these prob-

abilities are calculated from the binomial distribution

B(K −1, p). As this distribution takes values in the set

{

0,1, .. ., K − 1

}

, we take value 0 to represent class

, 1 to C

, and so on, until K − 1 to C

. Now, since

K is known, the only unknown parameter is the prob-

ability of success p. Hence, we consider a network

architecture as in Figure 1 and train it to adjust all

connection weights from layer 1 to layer 3. Note that

the connections from layer 3 to layer 4 have a ﬁxed

weight equal to 1 and serve only to forward the value

of p to the output layer of the network, where the

probabilities from the binomial distribution are cal-

culated. For a given query x = (x

,. .. ,x

), the output

of layer 3 will be a single numerical value in [0,1],

denoted by p

. Then, the probabilities in layer 4 are

calculated from the binomial distribution:

P(C

x) = B

k−1

(K − 1, p

), k = 1,...,K, (1)

where

k−1

(K − 1, p

) =

(K − 1)! p

k−1

(1 − p

)

K−k

(k − 1)! (K − k)!

. (2)

When p

is in



, the highest a posteriori prob-

ability is P (C

x), and, therefore, the predicted bot-

tle price class is C

. More generally, when p

is in



i−1



, for some i in

{

1,. .. ,K

}

, the highest a pos-

teriori probability is P(C

x), and, therefore, the pre-

dicted bottle price class is C

. Hence, in order to train

the network on a training set T =

{

)

}

n=1

⊂

χ ×

{

}

k=1

, where χ is the feature space, we replace

by the value of p corresponding to the midpoint



k−1



, i.e., p

k−0.5

, and apply a suitable

optimization algorithm, like the Marquardt’s method

(Rao, 2019), to ﬁnd connection weights that minimize

the mean squared error

∑

n=1



target

− p

network

(w)



, (3)

where p

target

is the value of p replacing C

and

network

(w) is the output of layer 3 given the query

and having the network the weights w.

4.2 Modiﬁed Frank and Hall’s Method

Frank and Hall’s method was originally introduced

in (Frank and Hall, 2001). Just like the unimodal

model approach previously presented, the method is

intended for supervised classiﬁcation problems where

the classes are ordered. As before, suppose that the

K = 4 bottle price ordered classes are represented by

,. .. ,C

. Frank and Hall propose to use K − 1 bi-

nary classiﬁers to address the K-class ordinal prob-

lem. In order to train the classiﬁers, such as arti-

ﬁcial neural networks, support vector machines and

decisions trees, K − 1 data sets are derived from

the original data set. The i-th classiﬁer is trained

to discriminate C

,. .. ,C

from C

i+1

,. .. ,C

. Given

an unseen instance x = (x

,. .. ,x

), i.e., information

about a new consumer, the a posteriori probabilities

P(C

x), .. ., P(C

x) of the original K classes can

be estimated by combining the outputs of the K − 1

binary classiﬁers for that instance. As noticed in

(Cardoso and Pinto da Costa, 2007), the combina-

tion scheme suggested by Frank and Hall may lead

to negative probabilities, but the problem can be over-

come in the following manner: identifying the output

of the i-th classiﬁer with the conditional probabil-

ity P (C

> C

i−1

), the classes can be ranked

according to the following formulas:

P(C

> C

) = p

P(C

x) = 1 − p

P(C

> C

) = p

P(C

> C

j−1

)







= (1 − p

)P (C

> C

j−1

), j = 2, ., K − 1,

P(C

x) = P (C

> C

K−1

(4)

This is known as the modiﬁed Frank and Hall’s

method. Its implementation using networks is illus-

trated in Figure 2.

DATA 2023 - 12th International Conference on Data Science, Technology and Applications

266

Figure 1: Binomial network.

Figure 2: Implementation of the modiﬁed Frank and Hall’s method using networks.

5 PERFORMANCE ASSESSMENT

The use of traditional metrics, like the overall accu-

racy, to assess the performance of a classiﬁer in an

imbalanced test set is not appropriate, because they

tend to focus the model evaluation in the most fre-

quent class(es) (Branco et al., 2016). In this sec-

tion, we present suitable metrics for an imbalanced

and multi-class problem like ours. For further details

about the metrics and performance assessment in im-

balanced domains, the reader should refer to (Branco

et al., 2016) and references therein.

Suppose that there are n test instances. For the

i-th test case, given the observed vector x

of the pre-

dictive attributes, a classiﬁer makes a prediction

of the true class C

. Let I be the indicator function

that returns 1 if its argument is true and 0 otherwise.

Then, the classiﬁer has an overall accuracy or simply

accuracy given by

accuracy =

∑

i=1



= C



, (5)

which corresponds to the proportion of cases that are

correctly classiﬁed. As mentioned before, this is not

an appropriate metric for an imbalanced test set. In

an imbalanced and multi-class problem, if we focus

on a single class C, then we can introduce the recall

and precision for that class and the corresponding F

score as

recall (C) =

∑

i=1

I (C

= C)I



= C



∑

i=1

I (C

= C)

(6)

precision(C) =

∑

i=1

I (C

= C)I



= C



∑

i=1



= C



(7)



1 + β



precision(C)recall (C)

precision(C) + recall (C)

. (8)

Predicting How Much a Consumer Is Willing to Pay for a Bottle of Wine: Dealing With Data Imbalance

267

Hence, recall (C) represents the proportion of cases

from class C that are correctly classiﬁed and

precision(C) the proportion of cases predicted as

being from class C that are correctly classiﬁed.

Moreover, F

precision(C), where β is a parameter set by the

user to adjust the relative importance of the for-

mer with respect to the latter. Usually, β = 1 and

so the same importance is given to recall (C) and

precision(C). Remark that F

when both recall (C) and precision (C) are high.

If there are K classes, C

,. .. ,C

, one then aver-

ages F

), .. ., F

) to obtain the so-called MF

score:

∑

k=1

)

. (9)

This single scalar metric is considered suitable to

compare the performance of different classiﬁers in an

imbalanced test set. For this reason, we use it in the

next section to analyze the results we obtained in our

problem.

6 RESULTS

All computer experiments were carried out using

Matlab R2021a with the Statistics and Machine

Learning Toolbox. We ﬁtted artiﬁcial neural net-

works (NNs), support vector machines (SVMs) and

decisions trees to perfectly balanced training data,

obtained by applying the three re-sampling strate-

gies previously described, namely random under sam-

pling (RUS), random over-sampling with replacement

(ROSWR) and the synthetic minority over-sampling

technique for nominal predictive attributes (SMOTE-

N). The models’ hyperparameters, such as a regular-

ization term strength in the case of NNs, the scale

and type of kernel (Gaussian, linear or polynomial)

in the case of SVMs and several parameters related

to tree depth control in the case of decision trees,

were chosen in order to obtain the best estimate of the

prediction error, calculated by applying stratiﬁed 5-

fold cross-validation to the training set (Hastie et al.,

2009). In this way, we avoided underﬁtting and over-

ﬁtting. This was done in the conventional approach

to supervised classiﬁcation, in the unimodal binomial

paradigm (binomial model, for short) and in the mod-

iﬁed Frank and Hall’s method. The trained models

were then applied to the test data.

Table 2 presents the results we obtained using the

score to assess the performance, in our test set,

of different approaches to our problem of predicting

how much a consumer is willing to pay for a bottle of

wine. Remark that MF

is given by (9) with β = 1,

i.e., when we average F

over all wine price classes to

obtain MF

, we give the same importance to the com-

bination of recall (6) and precision (7) in F

(8). Note

that, the higher the value of these measures, the bet-

ter. In Table 2, if we consider the results we obtained

in (Alonso and Candeias, 2022) with the original im-

balanced training data set, it can be seen the best one,

= 0.4382, was achieved by a NN in the modiﬁed

Frank and Hall’s method (best imbalanced classiﬁer).

If we look at the results we obtained in this paper by

applying re-sampling strategies to balance the origi-

nal training data set, it can be seen that, for each re-

sampling strategy, the best result was also achieved by

a NN in the modiﬁed Frank and Hall’s method, with

= 0.4836 for RUS, MF

= 0.4614 for ROSWR

and MF

= 0.5528 for SMOTE-N (best balanced clas-

siﬁer). Hence, we were able to increase the MF

score

from 0.4382 in the imbalanced data approach to as

much as 0.5528 when we applied re-sampling strate-

gies.

Table 3 shows the F

score per class, in the test

set, for the best imbalanced classiﬁer and the best

balanced one, and it can be seen that the application

of the SMOTE-N re-sampling strategy led to an im-

provement of F

in the least represented classes in the

test set (C

, C

and C

) and didn’t decrease it signif-

icantly in the most represented one (C

). Therefore,

we achieved our goal of obtaining a more balanced

classiﬁer, i.e., one with an improved ability to predict

infrequent cases without seriously compromising the

prediction of frequent ones.

The analysis we present next highlights the impor-

tance of a careful choice of the re-sampling strategy

and the learning method in our imbalanced problem.

From Table 2, remark that, in general, the use of RUS

and ROSWR didn’t improve the corresponding im-

balanced data results when we considered the conven-

tional and binomial learning methods; the only excep-

tion was in the conventional NN case, where the MF

score was slightly better with ROSWR. However, the

use of RUS and ROSWR always improved the cor-

responding imbalanced data results when we consid-

ered the modiﬁed Frank and Hall’s method, regardless

of the type of classiﬁer implemented; moreover, if we

compare RUS with ROSWR, we can say that the for-

mer is preferable in most cases. Now, if we focus on

the use of SMOTE-N, it is clear that, for all possibil-

ities of learning method and classiﬁer considered, it

always led to results that are better than the ones ob-

tained by applying RUS and ROSWR. Furthermore,

if we compare it with the imbalance data approach, it

can be seen that the only case where the results didn’t

improve was the one corresponding to the binomial

NN. Finally, note that the best SMOTE-N results were

DATA 2023 - 12th International Conference on Data Science, Technology and Applications

268

Table 2: Performance assessment in the test set using the MF

score.

Learning

Classiﬁer

Imbalanced Re-sampling strategy

method data RUS ROSWR SMOTE-N

Tree 0.3142 0.2592 0.2584 0.3680

Conventional SVM 0.4355 0.4054 0.3411 0.4857

NN 0.4089 0.3949 0.4152 0.4550

Tree 0.3960 0.2107 0.3623 0.4236

Binomial SVM 0.3807 0.2772 0.3479 0.4184

NN 0.4141 0.2255 0.3527 0.3923

Modiﬁed Tree 0.3464 0.3965 0.4387 0.4888

Frank and SVM 0.3783 0.4756 0.3863 0.5386

Hall’s NN 0.4382 0.4836 0.4614 0.5528

Table 3: F

score per class, in the test set, for the best classiﬁer ﬁtted to the imbalanced training data (best imbalanced) and for

the best classiﬁer obtained by applying a re-sampling strategy to balance the training data, namely SMOTE-N (best balanced).

The best classiﬁers are those exhibiting a higher MF

Classiﬁer

per class

Best imbalanced 0.6415 0.6667 0.4444 0.0000

Best balanced 0.7059 0.6269 0.5926 0.2857

always achieved when the modiﬁed Frank and Hall’s

method was set for learning algorithm. One differ-

ence between this method and the other ones lies in

the fact that it is the only algorithm that, given an in-

stance, combines the outputs of several classiﬁers in

order to produce a ﬁnal prediction of the true instance

class; the other algorithms make the prediction based

on only one classiﬁer. We believe that this may be a

reason for its success.

7 CONCLUSIONS AND FUTURE

WORK

In this paper, we presented a ﬁrst approach to the issue

of dealing with data imbalance in the multi-class or-

dinal classiﬁcation problem of predicting how much

a Portuguese consumer is willing to pay for a bot-

tle of wine. More precisely, we applied several re-

sampling strategies intended to balance the training

data to which various predictive models were ﬁt under

different learning methods. In this context, we carried

out a comparative study using performance measures

adequate to the imbalance nature of the problem. We

were able to obtain more balanced classiﬁers, i.e.,

models with an improved ability to predict infrequent

cases without seriously compromising the prediction

of frequent ones. Furthermore, we concluded that the

best balanced classiﬁers were the ones associated to

the application of the SMOTE-N re-sampling strategy

and the modiﬁed Frank and Hall’s learning method.

Motivated by the good results of this method and the

fact that it was the only one we applied where the

outputs of several classiﬁers are combined in order

to produce a ﬁnal prediction of the true class, in the

future, we plan to apply ensemble methods like bag-

ging and boosting, where a set of individual learners

are combined to create one learner with a better per-

formance than the individual ones (see, for instance,

(Galar et al., 2012; Tanha et al., 2020)). Moreover,

we may apply a combination of under-sampling and

over-sampling.

ACKNOWLEDGEMENTS

This work was partially supported by the Cen-

ter for Research and Development in Mathematics

and Applications (CIDMA) through the Portuguese

Foundation for Science and Technology (FCT –

Fundac¸

ao para a Ci

encia e a Tecnologia), references

UIDB/04106/2020 and UIDP/04106/2020.

REFERENCES

Alonso, H. and Candeias, T. (2022). Predicting how much a

consumer is willing to pay for a bottle of wine: a pre-

liminary study. In Procedia Computer Science, vol-

ume 204, pages 836–843.

Arafat, M. Y., Hoque, S., Xu, S., and Farid, D. M. (2019).

Machine learning for mining imbalanced data. IAENG

Predicting How Much a Consumer Is Willing to Pay for a Bottle of Wine: Dealing With Data Imbalance

269

International Journal of Computer Science, 46:332–

348.

Branco, P., Torgo, L., and Ribeiro, R. P. (2016). A survey

of predictive modeling on imbalanced domains. ACM

Computing Surveys, 49:1–50.

Bruwer, J., Li, E., and Reid, M. (2002). Segmentation of the

australian wine market using a wine-related lifestyle

approach. Journal of Wine Research, 13:217–242.

Cardoso, J. S. and Pinto da Costa, J. F. (2007). Learning

to classify ordinal data: the data replication method.

Journal of Machine Learning Research, 8:1393–1429.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,

W. P. (2002). Smote: Synthetic minority over-

sampling technique. Journal of Artiﬁcial Intelligence

Research, 16:321–357.

Cost, S. and Salzberg, S. (1993). A weighted nearest neigh-

bor algorithm for learning with symbolic features.

Machine Learning, 10:57–78.

Frank, E. and Hall, M. (2001). A simple approach to or-

dinal classiﬁcation. In Proceedings of the 12th Eu-

ropean Conference on Machine Learning, volume 1,

pages 145–156.

Galar, M., Fernandez, A., Barrenechea, E., Bustince, H.,

and Herrera, F. (2012). A review on ensembles for the

class imbalance problem: Bagging-, boosting-, and

hybrid-based approaches. IEEE Transactions on Sys-

tems, Man and Cybernetics Part C: Applications and

Reviews, 42:463–484.

Ganganwar, V. (2012). An overview of classiﬁcation algo-

rithms for imbalanced datasets. International Journal

of Emerging Technology and Advanced Engineering,

2:42–47.

Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue,

H., and Bing, G. (2017). Learning from class-

imbalanced data: Review of methods and applica-

tions. Expert Systems With Applications, 73:220–239.

Hastie, T., Tibshirani, R., and Friedman, J. (2009). The

Elements of Statistical Learning: Data Mining, In-

ference, and Prediction. Springer-Verlag, New York,

USA, 2nd edition.

Koksal, M. H. (2021). Segmentation of wine consumers

based on level of involvement: a case of Lebanon.

British Food Journal, 123:926–942.

Kolyesnikova, N., Dodd, T. H., and Duhan, D. F. (2008).

Consumer attitudes towards local wines in an emerg-

ing region: A segmentation approach. International

Journal of Wine Business Research, 20:321–334.

Kotler, P. and Keller, K. L. (2006). Marketing management.

Prentice Hall, Upper Saddle River, USA, 12th edition.

Kotsiantis, S., Kanellopoulos, D., and Pintelas, P. (2006).

Handling imbalanced datasets: A review. GESTS In-

ternational Transactions on Computer Science and

Engineering, 30:25–36.

More, A. S. and Rana, D. P. (2021). Review of imbalanced

data classiﬁcation and approaches relating to real-time

applications. In Rana, D. and Mehta, R., editors, Data

Preprocessing, Active Learning, and Cost Perceptive

Approaches for Resolving Data Imbalance, chapter 1,

pages 1–22. IGI Global, Pennsylvania, United States.

Payini, V., Bolar, K., Mallya, J., and Kamath, V. (2022).

Modeling hedonic motive–based segments of wine

festival visitors using decision tree approach. Interna-

tional Journal of Wine Business Research, 34:19–36.

Pinto da Costa, F., J., Alonso, H., and Cardoso, J. S. (2008).

The unimodal model for the classiﬁcation of ordinal

data. Neural Networks, 21:78–91.

Pinto da Costa, F., J., Alonso, H., and Cardoso, J. S. (2014).

Corrigendum to ‘The unimodal model for the classi-

ﬁcation of ordinal data’ [Neural Netw. 21 (2008) 78-

79]. Neural Networks, 59:73–75.

Rao, S. S. (2019). Engineering Optimization: Theory and

Practice. John Wiley & Sons, Inc, New Jersey, USA,

5th edition.

Rouzet, E. and Seguin, G. (2004). Il marketing del vino. Il

mercato. Le strategie commerciali. La distribuzione.

Il Sole 24 ORE Edagricole, Bologna, Italia.

Stanﬁll, C. and Waltz, D. (1986). Toward memory-based

reasoning. Communications of the ACM, 29:1213–

1228.

Sun, Y., Wong, A. K. C., and Kamel, M. S. (2009). Classi-

ﬁcation of imbalanced data: A review. International

Journal of Pattern Recognition and Artiﬁcial Intelli-

gence, 23:687–719.

Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., and Asad-

pour, M. (2020). Boosting methods for multi-class im-

balanced data classiﬁcation: an experimental review.

Journal of Big Data, 7:1–47.

Thach, E. C. and Olsen, J. E. (2005). The search for

new wine consumers: Marketing focus on consumer

lifestyle or lifecycle? International Journal of Wine

Marketing, 16:44–57.

Thach, E. C. and Olsen, J. E. (2006). Market segment anal-

ysis to target young adult wine drinkers. Agribusiness,

22:307–322.

DATA 2023 - 12th International Conference on Data Science, Technology and Applications

270