pdi-Bagging: A Proposal of Bagging-Type Ensemble Method

Generating Virtual Data

Honoka Irie and Isao Hayashi

Graduate School of Informatics, Kansai University, Takatsuki, Osaka, Japan

Keywords:

Fuzzy Inference, Virtual Data, Ensemble Method, Bagging, Clustering.

Abstract:

For pattern classiﬁcation problems, there is ensemble learning method that identiﬁes multiple weak classiﬁers

by the learning data and combines them together to improve the discrimination rate of testing data. We have

already proposed pdi-Bagging (Possibilistic Data Interpolation-Bagging) which improves the discrimination

rate of testing data by adding virtually generated data to learning data. In this paper, we propose a new

method to specify the generation area of virtual data and change the generation class of virtual data. As a

result, the discriminant accuracy is improved since ﬁve new bagging methods which generate virtual data

around correct discrimination data and error discrimination data are formulated, and the class of virtual data

is determined with the proposed new evaluation index in multidimensional space. We formulate a new pdi-

Bagging algorithm, and discuss the usefulness of the proposed method using numerical examples.

1 INTRODUCTION

Recently, ensemble learning methods(Polikar, 2006;

Rokach, 2009), which are useful for pattern clas-

siﬁcation problems, have been proposed. The en-

semble method learns multiple weak classiﬁers us-

ing training data and can improve the classiﬁcation

accuracy of the evaluation data by combining multi-

ple weak classiﬁers over the layers. Ensemble learn-

ing can be broadly categorized into two types: the

classiﬁer combination model and the attribute com-

bination model. In particular, the classiﬁer combi-

nation model can be classiﬁed into an independent

type in which each classiﬁer is combined indepen-

dently and a dependent type in which each classiﬁer is

combined while maintaining a dependency relation-

ship. In the independent type, each classiﬁer is trained

with individual training data, so it is possible to inte-

grate them independently and obtain a high classiﬁca-

tion rate. The independent type includes the bagging

method(Breiman, 1996), Random Forests(Breiman,

2001), Error-Correcting Output Codes(Dietterich and

Bakiri, 1995). The bagging method represents boot-

strap aggregation. The learning data for a classi-

ﬁer are obtained via bootstrap sampling, and multiple

classiﬁers are learned independently from the learn-

ing data. Finally, the ﬁnal result is obtained based on

the majority vote involving all the integrated classi-

ﬁers. Since the bagging method is a simple ensemble

method that uses multiple classiﬁers, the algorithm

is simple and offers high applicability. For exam-

ple, it is often used as a clustering model for medical

data(Breiman, 1996).

On the other hand, there are boosting

method(Freund and Schapire, 1997; Friedman

et al., 2000) and adaptive mixture method of local

experts(Jacobs et al., 1991) as the dependent type of

classiﬁer combination model. Boosting is a method to

improve the classiﬁcation rate by sequentially learn-

ing weak classiﬁers. In particular, AdaBoost(Freund

and Schapire, 1997) is particularly useful and has

the advantage of being easy to analyze features of

datasets. In this way, the dependent type, represented

by boosting, is trained by multiple weak classiﬁers

while maintaining sequential interdependence with

the training data and can identify the input-output

relationship with the dependence. On the other hand,

in the independent type, represented by bagging, the

weak classiﬁer is independent for each training data,

but the processing algorithm is relatively simple and

has high accuracy.

We have proposed a new bagging algorithm for

the generation and interpolation of data around mis-

classiﬁed data using a speciﬁed membership func-

tion(Hayashi and Tsuruse, 2010; Hayashi et al.,

2012). We name this method possibilistic data in-

terpolation bagging (pdi-bagging). The interpola-

tion of data around misclassiﬁed data is called vir-

956

Irie, H. and Hayashi, I.

pdi-Bagging: A Proposal of Bagging-Type Ensemble Method Generating Virtual Data.

DOI: 10.5220/0011827600003393

In Proceedings of the 15th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2023) - Volume 3, pages 956-963

ISBN: 978-989-758-623-1; ISSN: 2184-433X

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

tual data. In pdi-bagging, data misclassiﬁed by the

classiﬁer model are not weighted as in AdaBoost,

nor are they added to the next training data. The

classes of virtual data are estimated from their lo-

cations(Irie and Hayashi, 2019b; Irie and Hayashi,

2020) and the virtual data are added to training data

to estimate discriminant lines using weak classiﬁers

based on fuzzy inference(Nomura et al., 1991; Irie

and Hayashi, 2019a). Similarly, in the next layer,

the class of virtual data is estimated and added to the

training data to estimate the discriminant line. This

series of operations is repeated, and ﬁnally, the clas-

siﬁcation rate for the evaluation data is obtained by a

majority vote of multiple weak classiﬁers. Since the

number of data increases with the addition of virtual

data during training, the amount of data in each class

is equalized by eliminating the bias in the amount of

data between classes, which improves the accuracy of

identifying the discriminant line. In this paper, we

formulate ﬁve types of virtual data generation meth-

ods and discuss their usefulness using numerical ex-

amples.

2 pdi-Bagging

A conceptual diagram of pdi-bagging is shown in

Fig.1. In pdi-Bagging, ﬁrst, weak classiﬁers M

fuzzy inference are learned using training data prob-

abilistically extracted from all datasets, and the dis-

criminant rate of the training data T RD is calculated.

Next, virtual data are generated around the misclas-

siﬁed data using membership functions. The gener-

ated virtual data is added to the original training data

to increase the number of training data T RD. Using

original training data and virtual data, the classiﬁca-

tion rate is calculated by a weak classiﬁer M

based

on fuzzy inference. Because of increasing the number

of T RD improves the discriminant accuracy of weak

classiﬁers. The repeating of operations is ﬁnished at

the L times when the end judgment is satisﬁed. Fi-

nally, the evaluation data (CHD) are input to L weak

classiﬁers M

,···, M

, and the ﬁnal result

is then calculated by majority rule. Since pdi-Bagging

adds virtual data to training data and calculates the

discriminant rate by multiple weak classiﬁers, its dis-

criminant rate is higher than the conventional bagging

method and AdaBoost(Hayashi and Tsuruse, 2010;

Hayashi et al., 2012).

In pdi-Bagging, fuzzy clustering by simple fuzzy

inference(Nomura et al., 1991) is adopted as a weak

classiﬁer. Fuzzy inference is excellent in learning

ability and can realize visualization of learning re-

sults using rule description. Therefore, the fuzzy in-

ference is adopted here as a weak classiﬁer. Simpli-

ﬁed fuzzy inference expresses rules in if-then form,

uses fuzzy sets deﬁned by membership functions in

the antecedent part, and deﬁnes the consequent part

in singleton form with real numbers. We use here a

trapezoidal fuzzy set as the membership function.

!"#$%&'

(&$&

"#$$%&'

()&#$$%&'

"#$$%&'

()&#$$%&'

"#$$%&'

()&#$$%&'

!"#

$%#

!"#$%&'(

)*+&,&$-

!"#$%

&'()'(

!"#$%&

!!!!!!!!

.'*%"'&$-/0123*%

)&,+%&2&-"-'/4"'*

!"#$%&'(")*+,"

*+$,

-%$.."/"+0

Figure 1: pdi-Bagging Algorithm.

Let z be the output variable and p

be the sin-

gleton in the consequent part, the fuzzy rule, r

, i =

1,2, ··· ,R, is expressed as follows.

: if x

is µ

) and ··· and x

is µ

)

then C = {C

| z = p

}

where C is the output class, and C

indicates that the

class value is C

in rule r

Suppose we have obtained the input data x =

,···, x

). The input data x is input to the an-

tecedent part of the i-th fuzzy rule r

, and the degree

of the antecedent part, µ

(x) = µ

)·µ

)· ··· ·

), is calculated. The result of fuzzy inference,

ˆz, and class C are calculated by the following equa-

tions.

ˆz =

∑

i=1

(x) · p

∑

i=1

(x)

C = {C

| min |ˆz −z|}

Now, let’s explain how to generate virtual data in

pdi-Bagging. Let x

(d) = (x

(d),x

(d),··· , x

(d),

··· ,x

(d)) denote the d-th data in the data set D con-

sisting of W data. Virtual data x

(d) are generated

around correctly discriminated data(correct-classiﬁed

data) x

(d) and misclassiﬁed data x

(d). For a cer-

tain real number h, 0 ≤ h ≤ 1, the virtual data x

(d)

of the j-th attribute of x

(d) is generated using the

membership function µ

) of the fuzzy number F

as follows.

(d) = {x

| µ

) = h, µ

(d)) = 1}

h ∼ N(1,1), 0 ≤ h ≤ 1

where, x

(d) means correct-classiﬁed data x

(d) or

misclassiﬁed data x

(d). In addition, the membership

pdi-Bagging: A Proposal of Bagging-Type Ensemble Method Generating Virtual Data

957

function µ

) is deﬁned by the following normal

distribution whose center is x

(d) and whose standard

deviation is σ.

) =

√

2πσ

exp(−

−x

(d))

2σ

) (1)

We propose the following ﬁve methods for gener-

ating virtual data.

(1) CA: Virtual data generation method with

correct classiﬁed data in the whole space

When the training data x

(d) is correctly clas-

siﬁed by the weak classiﬁer, virtual data x

(d)

are generated around the correct classifying data

(d).

(2) CC: Virtual data generation method with

correct classiﬁed data at the cluster center

When the training data x

(d) is misclassiﬁed by

the weak classiﬁer, the midpoint between the clos-

est correct classiﬁed data and the farthest cor-

rect classiﬁed data from x

(d), whose classes are

same as the misclassiﬁed data x

(d) is calculated.

Virtual data x

) are generated around the cor-

rect classiﬁed data x

) closest to the midpoint.

) = {x

(e) | min

(e) −

(max

(d)

−x

( f )|+ min

(d) −x

(g)|)|,

f or ∀e, f ,g}

(3) E: Virtual data generation method with

misclassiﬁed data

When the training data x

(d) is discriminated as

misclassiﬁed by the weak classiﬁer, virtual data

(d) are generated around the misclassiﬁed data

(d).

(4) MA: Virtual data generation method by

mixing correct classiﬁed data and

misclassiﬁed data in the whole space

By alternately using CA type and E type in each

layer of bagging, virtual data x

(d) is generated

around x

(d) and x

(d).

(5) MC: Virtual data generation method by

mixing correct classiﬁed data and

misclassiﬁed data at the cluster center

By alternately using CC type and E type in each

layer of bagging, virtual data x

(d) is generated

around x

(d) and x

(d).

In particular as to CC, we explain about how to gen-

erate virtual data arround the correct classiﬁed data at

the cluster center using Fig.2. We assume the classter-

ing problem of a total of 8 data into two classes, green

class and yellow class, in Fig.2. The training data with

a green frame in yellow located at the bottom of the

ﬁgure is misclassiﬁed as the green class, although the

true class is the yellow class. Since the true class of

the misclassiﬁed data x

(d) is yellow class, the mid-

point between the closest correct classiﬁed data and

the farthest correct classiﬁed data from x

(d), whose

classes are yellow is calculated. Virtual data x

)

are generated around the correct classiﬁed data x

)

whose class is yellow closest to the midpoint. Ac-

cording to the generation method, many virtual data

in the CC method tend to generate near the center of

the cluster. Therefore, we should note that the gener-

ation of virtual data by the CC method tends to affect

the discriminant line, compared to the CA that gen-

erates virtual data in the entire space. In addition, it

is possible to control the degree of inﬂuence on the

discriminant line by moving the coordinate posision

currently set as the midpoint to an arbitrary interpola-

tion point or extrapolation point from the endpoints.

!"#$!%&" '&&()*+&#

!"#$"%&" '&&()*+&#

,#-

!"#$./01#1&$.0((#2&

./311)4)#5$63&3

!"#$73(&"#1&$.0((#2&

./311)4)#5$63&3

!"#$%&'()&$&

.0((#2&$

./311)4)#5$63&3

.0((#2&$./311)4)#5$63&3

,$-

,%-

,#&-

8 90(:3/

6)1&()*+&)0;

8 <;)40(:

6)1&()*+&)0;

=)12/311)4)#5$63&3

:)5>0);&

Figure 2: Generation of Virtual Data with Correct Data

around Cluster Center.

3 FORMULATION FOR CLASS

MODIFICATION

We propose here a new class determination method

for assigning correct classes to virtual data. Suppose

that virtual data x

(d) are generated from the correct

classiﬁed data x

(d) and the misclassiﬁed data x

(d).

Basically, the class of the virtual data x

(d) should

be the same as the output class of the source data

(d) = {x

C,k

(d), x

E,k

(d)}. However, virtual data

may generate at locations far from the source data.

In addition, virtual data may generate in areas where

different classes of data are dense. herefore, the class

∗

of the virtual data x

(d) is determined by the inte-

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

958

gration evaluation formula using the following three

evaluation criteria. Therefore, the class k

∗

of the vir-

tual data x

(d) is determined by the integration evalu-

ation formula using the following three evaluation cri-

teria; the evaluation of the correct/misclassiﬁed data

), the evaluation of the class centers (E

), the eval-

uation of neighborhood data classes (E

(1)Evaluation of Correct/Misclassiﬁed Data(E

)

The evaluation value E

is deﬁned by the distance

between the virtual data x

(d) and the source data

S,k

(d) with class k. The smaller this evaluation value

, the higher the dependence of x

(d) on class k.

(d)−x

S,k

(d)|

max

S,k

(d)−x

D+V

(e)|−min

S,k

(d)−x

D+V

( f )|

f or ∀e, f

= 1 −E

, f or p 6= k

(2)Evaluation of Class Centers(E

)

The evaluation value E

is deﬁned by the distance be-

tween the virtual data x

(d) and the center of the class

k. The smaller this evaluation value E

, the higher the

dependence of x

(d) on class k. Now, when the cen-

ter of class k is represented by x

(d) −x

max

e, f

D+V

(e) −x

D+V

( f )|

, f or ∀e, f

(3)Evaluation of Neighborhood Data Classes(E

)

The evaluation value E

is deﬁned by the distance

between the virtual data x

(d) and the closest cor-

rect/misclassiﬁed data x

S,k

(e) with class k. The

smaller this evaluation value E

, the higher the de-

pendence of x

(d) on class k.

min

(d) −x

S,k

(e)|

max

f ,g

D+V

( f ) −x

D+V

(g)|

, f or ∀e, f ,g

According to these three criteria, the evaluation

is higher when the virtual data generate near the

source data. On the other hand, the evaluation E

high when the virtual data generate near the center of

the class.

By integrating these three evaluation criteria, the

overall evaluation value E

is obtained. The virtual

data x

(d) has the class k

∗

that minimizes the follow-

ing overall evaluation value E

∗

= {k|min

= min

+ w

)} (2)

where w

are the weights of each evaluation

value.

We formulate the pdi-Bagging algorithm as fol-

lows.

Step 1 We assume that the W data D is obtained. Data

D are categorized into two types of datasets:

T RD

training data D

T RD

and W

CHD

check

data D

CHD

. In addition, interpolated data are

represented by D

Step 2 The training data D

T RD

are used as input to the

l-th weak classiﬁer M

, and the discriminant

rate r

T RD

is obtained. where M

is the initial

weak classiﬁer.

Step 3 The d-th data that was correctly or misclassi-

ﬁed is temporarily extracted from D

T RD

. As-

sume that the d-th data point is misclassiﬁed.

For the j-th attribute value x

(d) of the correct

classiﬁed data or the misclassiﬁed data, virtual

data x

(d) are generated by the membership

function µ

Step 4 Calculate the class k

∗

of the virtual data x

(d)

by the equation (2). Remove the virtual data

(d) from the l −1th D

with l > 2, and adds

virtual data x

(d) with class k to the lth D

Step 5

Extract v pieces of virtual data from D

random number and add them to D

T RD

Step 6 Steps 2 to 4 are repeated with l = l + 1, and

the algorithm is terminated at K = l satisfying

CHD

≥ θ for threshold θ. Alternatively, the

algorithm ends when l ≥ K is satisﬁed for the

number of weak classiﬁers L and the number

of iterations K, K ≤L.

Step 7 To obtain the ﬁnal discrimination result, D

CHD

is applied to M

, M

, ··· , M

, and

then the discriminant rate r

CHD

is obtained by

majority rule.

4 VERIFICATION AND

DISCUSSION USING

NUMERICAL DATA

To explain the pdi-bagging algorithm, we discuss the

two-dimensional classiﬁcation problem. It is assumed

that 200 training data points and 200 checking data

points exist in a two-dimensional space of the inter-

val [0,1], and that these data can be categorized into

two classes. Fig. 3 shows the numerical data used

for training data and checking data. These numerical

data were constructed by adding the value ±0.05 to

the basic data using random numbers. We deal with

two-input and two-classes discrimination problems as

numerical data. For this discriminant problem, the

real value of the consequent part of the fuzzy infer-

ence rules are set to 2.0 (red, ) and 3.0 (blue, 4).

pdi-Bagging: A Proposal of Bagging-Type Ensemble Method Generating Virtual Data

959

Simpliﬁed fuzzy inference is used as the weak clas-

siﬁer, and ﬁve types of trapezoidal membership func-

tions are set for each input interval [0, 1]. Since the

data space is two-dimensional, the 25 rules are con-

stracted in the whole area of the space. In addition,

in order to verify the classiﬁcation rate when rules

are added to the data space as speciﬁc areas, 49 rules

are added to G

= {(x

) | [0.4,0.7] × [0.4, 0.7]}

as the speciﬁc area G

, and 4 rules are added to

= {(x

) | [0.7, 0.8] ×[0.3,0.7]} as the speciﬁc

area G

. As a result, the total number of rules is 78.

The addition of the rules improves the accuracy of the

discriminant rate in regions away from the discrim-

inant line where the data are dense, and the overall

discriminant rate is improved. The discriminant rate

was here calculated for a total of three types: no ad-

ditional rule, membership function set in the trape-

zoidal shape, and membership function deﬁned in the

right-angled trapezoidal shape at both ends of speciﬁc

regions. When the membership function in the spe-

ciﬁc region are set as right-angled trapezoid type at

both ends of speciﬁc regions, the size of the speciﬁc

region does not change even if the membership func-

tions are learned. On the other hand, when trapezoidal

membership functions are set at both ends of speciﬁc

regions, the size of the region changes as the mem-

bership functions are learned. Therefore, when the

right-angled trapezoidal membership function are set

in the additional rules, the membership functions do

not move outside the speciﬁc region even when the

membership functions are learned, and it is learned

intensively within the speciﬁc region.

!"#$%&'()*+",$-."'/,($0"1"$$2 !3#$%&'()*+",$-."'/,($0"1"$4

Figure 3: Numerical Example Training and Testing Data.

The initial value of the antecedent part of the fuzzy

reasoning is set by the default method, and the learn-

ing order of the antecedent and consequent parts is

that the consequent part is learned ﬁrst, and then the

antecedent part and the consequent part are alternately

learned. In the learning process, the learning coefﬁ-

cients of the x-coordinates x

and x

of the two ver-

tices of the upper bases of the trapezoidal membership

function denote K

and K

, and were set to 0.01(Irie

and Hayashi, 2019a). In addition, the learning co-

efﬁcients of the difference α and β between the x-

coordinates of the upper and lower bases denote K

and K

, and were set to 0.01(Irie and Hayashi, 2019a).

On the other hand, the learning coefﬁcient K

of the

singleton of the consequent part was set to 0.4 for

the ﬁrst consequent learning and 0.6 for the alternate

learning. The number of epochs of the consequent

part is set to 10, and the alternating learning of the

consequent part is set to (10, 10).

As a membership function µ

) for generating

virtual data, the normal distribution of Equation (1)

with a standard deviation of σ = 0.5 was selected,

andthe number of virtual data generated was basically

one. However, in preliminary experiments, the dis-

criminant rate of fuzzy inference was about 87%. As

a result, about 26 out of 200 checking data are er-

roneously classiﬁed, and about 8 virtual data are re-

quired to make the total number of virtual data equal

to 200 training data. Therefore, we also discussed the

discriminant rate when the number of generated vir-

tual data was changed from 1 to 10.

The evaluation values weight for class estimation

of virtual data are (w

) = {(1/3, 1/3, 1/3), (0.2,

0.4, 0.4), (0.2, 0.3, 0.5) , (0.2, 0.5, 0.3), (0.5, 0.25,

0.25), (0.01, 0.495, 0.495), (0.05, 0.475, 0.475)}. In

determining the weight, the weight w

of the dis-

tance from the source data has a large effect on the

class estimation. Therefore, we discussed the dis-

criminant rate for a total of 7 types: w

= 1/3 when

= w

, w

= 0.5, and 5 types with the value

of w

reduced.

The algorithm is terminated by the termination rule

whose number of iterations K = 5. In the mixed dis-

criminant type, the type for the misclassiﬁed data was

adopted in the odd layers, and the type for the correct

classiﬁed data was adopted in the even layers. In the

learning process of fuzzy inference, the order of data

is changed by random numbers every epoch. Since

the number of epochs for the learning of the conse-

quent part and the alternate learning of the antecedent

part and the consequent part is 10 and (10, 10), re-

spectively, the total number of epochs is 150 in the

ﬁve-layer learning. Since 2-fold cross-validation is

used here, 150 epochs of epoch learning for each data

set to result in a total of 300 epochs of learning. We

compared the average discriminant rates obtained in

10 trials for each of the different types, CA, CC, E,

MA, and MC.

The discriminant rate for evaluation data by 5 types

of virtual data generation methods: type of correct

classiﬁed data in the whole space(CA), type of correct

classiﬁed data at the cluster center(CC), type of mis-

classiﬁed data(E), mixing type of correct classiﬁed

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

960

data and misclassiﬁed data in the whole space(MA),

and mixing type of correct classiﬁed data and mis-

classiﬁed data at the cluster center(MC) are shown in

Table 1 and Figures 4-6. Table 1 shows the discrim-

inant rate for each weight with respect to the eval-

uation index, with and without additional rules, and

with respect to the shape of the membership function

within a speciﬁc region. We also calculated the differ-

ence from the discriminant rate when 25 rules were

set with the trapezoidal membership function. Fig-

ures 4 to 6 show the average discriminant rate for the

weight with respect to the evaluation index, with and

without additional rules, and with respect to the shape

of the membership function within a speciﬁc region.

)* )) + ,* ,)

!"#$%"&"'(')*+(),

Figure 4: Average Discriminant Rates of 5 Methods in 25

Basic Rules.

First, from the results in Table 1 and Fig. 4, the

following characteristics of the discriminant rate are

clear for the case of 25 rules with trapezoidal mem-

bership functions. The discriminant rate by 2-fold

cross validation of fuzzy inference with 25 rules was

84.40%. The discriminant rate of all ﬁve methods that

generate virtual data is higher than the result of this

fuzzy inference, so the generation of virtual data is

effective in improving the discriminant rate.

In the case of 25 rules in the trapezoidal member-

ship function, the discriminant rate is not necessarily

high. On the other hand, the discriminant rates of 5

methods are higher than that of the 25 rules. In the

types of correct classiﬁed data, the discriminant rate

of CC is higher than that of CA, and even in the mix-

ing types of correct classiﬁed data, the discriminant

rate of MC is higher than that of MA. The reason is

that in CC and MC, the virtual data are generated near

the center of the cluster, so the fuzzy rules near the

center of the class are learned with high accuracy.

Table 1 and Fig.5 show the characteristics of the

discriminant rate of the 78 rules added within the

speciﬁc region by the trapezoidal membership func-

)* )) + ,* ,)

!"#$%"&"'(')*+(),

Figure 5: Average Discrimination Rates of 5 Methods in 78

Total Rules Added by Trapezoidal Membership Function.

tion. The discriminant rate of 2-fold cross valida-

tion of simple fuzzy inference with 78 rules using the

trapezoidal membership function was 89.68%. On the

other hand, among the ﬁve types of virtual data gen-

eration methods, the discriminant rates of three types,

CC, E, and MC are higher than simple fuzzy infer-

ence. Therefore, methods other than generating vir-

tual data in the entire space are effective.

)* )) + ,* ,)

!"#$%"&"'(')*+(),

Figure 6: Average Discrimination Rates of 5 Methods in

78 Total Rules Added by Right Trapezoidal Membership

Function.

In addition, Table 1 and Fig.6 show the character-

istics of the discriminant rate of the 78 rules added

within the speciﬁc region by the right-angled trape-

zoidal membership function. The discriminant rate

of 2-fold cross validation of simple fuzzy inference

with 78 rules using the right-angled trapezoidal mem-

bership function was 89.73%. Among the ﬁve types

of virtual data generation methods, the discriminant

rates of four types, CC, E, MA, and MC are higher

than simple fuzzy inference. In particular, the MC and

the CC are higher than 0.45%. Therefore, in the case

of 78 rules with right-angled trapezoidal membership

pdi-Bagging: A Proposal of Bagging-Type Ensemble Method Generating Virtual Data

961

Table 1: Comparison of Discriminante Rates According to 5 Methods.

Rule Format

Evaluation Values CA (%) CC (%) E (%) MA (%) MC (%)

Weight Dis.R. Dif. (a) Dif. (b) Dis.R. Dif. (a) Dif. (b) Dis.R. Dif. (a) Dif. (b) Dis.R. Dif. (a) DIf. (b) Dis.R. Dif. (a) Dif. (b)

(a) Trap.M.F.

25 Rules

1/3, 1/3, 1/3 86.73 — — 87.70 — — 86.61 — — 87.05 — — 87.29 — —

0.2, 0.4, 0.4 86.50 — — 87.60 — — 87.03 — — 87.28 — — 87.52 — —

0.2, 0.3, 0.5 87.00 — — 87.55 — — 86.70 — — 87.03 — — 87.10 — —

0.2, 0.5, 0.3 86.85 — — 87.70 — — 86.70 — — 87.08 — — 87.15 — —

0.5, 0.25, 0.25 86.40 — — 87.45 — — 86.95 — — 86.85 —- — 87.40 — —

0.01, 0.495, 0.495 87.18 — — 87.55 — — 86.55 — — 86.58 — — 86.88 — —

0.05, 0.475, 0.475 87.45 — — 87.48 — — 86.85 — — 86.63 — — 87.30 — —

Average 86.87 — — 87.58 — — 86.77 — — 86.93 — — 87.23 — —

(b) Trap,M.F.

78 Rules

1/3, 1/3, 1/3 89.53 2.80 — 89.83 2.13 — 89.80 3.18 — 89.78 2.73 — 89.79 2.50 —

0.2, 0.4, 0.4 89.33 2.83 — 89.93 2.33 — 90.15 3.13 — 90.00 2.73 — 89.95 2.43 —

0.2, 0.3, 0.5 89.03 2.03 — 90.15 2.60 — 90.30 3.60 — 89.78 2.75 — 89.93 2.83 —

0.2, 0.5, 0.3 88.95 2.10 — 89.65 1.95 — 90.05 3.35 — 89.85 2.78 — 89.83 2.67 —

0.5, 0.25, 0.25 89.18 2.77 — 89.48 2.03 — 90.05 3.10 — 89.38 2.53 — 90.23 2.83 —

0.01, 0.475, 0.475 87.40 0.22 — 89.80 2.25 — 88.63 2.08 — 88.55 1.97 — 89.43 2.55 —

0.05, 0.475, 0.475 88.70 1.25 — 90.00 2.52 — 89.83 2.97 — 89.85 3.23 — 89.85 2.55 —

Average 88.87 2.00 — 89.83 2.26 — 89.83 3.06 — 89.60 2.67 — 89.86 2.62 —

78 Rules

1/3, 1/3, 1/3 90.03 3.30 0.50 90.33 2.63 0.50 89.93 3.32 0.14 90.23 3.18 0.45 90.15 2.86 0.36

0.2, 0.4, 0.4 89.83 3.33 0.50 90.20 2.60 0.27 90.35 3.33 0.20 90.28 3.00 0.27 90.28 2.76 0.32

0.2, 0.3, 0.5 90.45 3.45 1.43 90.10 2.55 -0.05 90.05 3.35 -0.25 90.10 3.08 0.32 90.30 3.20 0.37

0.2, 0.5, 0.3 89.95 3.10 1.00 90.35 2.65 0.70 90.05 3.35 0.00 90.30 3.23 0.45 89.98 2.82 0.15

0.5, 0.25, 0.25 90.18 3.78 1.00 90.28 2.83 0.80 89.93 2.97 -0.13 90.05 3.20 0.67 90.35 2.95 0.13

0.01, 0.475, 0.475 87.55 0.37 0.15 90.40 2.85 0.60 88.63 2.07 0.00 88.40 1.82 -0.15 90.18 3.30 0.75

0.05, 0.475, 0.475 89.83 2.37 1.13 90.35 2.87 0.35 89.95 3.10 0.12 90.03 3.40 0.17 90.03 2.73 0.17

Average 89.69 2.81 0.81 90.29 2.71 0.45 89.84 3.07 0.01 89.91 2.99 0.31 90.18 2.94 0.32

functions, the average discriminant rate is high for

CC and MC. From the differences in the discriminant

rates of the 25 rules of the trapezoidal membership

function, the average discriminant rate increased by

2.71% to 3.07% for all ﬁve methods. However, the

rate of increase in the average discriminant rate of CA

and CC is slightly lower than the other methods. In

addition, the average discriminant rate of the 78 rules

of the right-angled trapezoidal membership function

is 0.38% higher than that of the 78 rules of the trape-

zoidal membership function. On the other hand, the

maximum discriminant rate was 90.35% for CC when

the weight of the evaluation index was (0.2, 0.5, 0.3)

and MC when the weight of the evaluation index was

(0.5, 0.25, 0.25). In the speciﬁc area, there are a lot

of singular point data, so the learning of the rules in

this area increases the overall discriminant rate. In ad-

dition, when the right-angled trapezoidal membership

functions are set in this speciﬁc region, the size of the

speciﬁc region does not change, so the membership

functions are efﬁciently learned within the speciﬁc re-

gion, and the overall discriminant rate increases.

Table 2: Results of t-Test between 5 Methods in 25 Basic

Rules.

Virtual Data

CA CC E MA MC

Generation Method

CA —



0.1779



—



0.1779 0.0291



—



0.1978



—



0.1978 0.2106



—

0.0291 0.2106

Table 2 shows the results of the t-test of the dis-

criminant rate by ﬁve virtual data generation methods

using 25 rules of the trapezoidal membership func-

tion. The numerical data in Fig.3 were used alter-

nately as training data and checking data by 2-fold

cross validation. In Table 2, the signiﬁcance of each

data is indicated by



and



when there is a sig-

niﬁcant difference between the ﬁve methods in the

one-tailed t-test with a signiﬁcance level of 5%. In

addition, the average value of p is shown when only

one of



and



is signiﬁcant. From Table 1, the

discriminant rates of CA, E, and MA are low, and the

discriminant rates of CC and MC are high. Therefore,

CC and MC are useful methods with higher discrimi-

nant rate than other methods.

Summarizing the results, the methods with the

highest discriminant rate were CC and MC with

78 rules using right-angled trapezoidal membership

functions in speciﬁc regions. In the two methods,

the discriminant rate was improved by adding rules to

speciﬁc regions where singularity data exists. In ad-

dition, since the membership function was deﬁned by

a right-angled trapezoid, the speciﬁc region was not

expanded, and the membership function was learned

intensively. These reasons have led to high discrimi-

nant rates.

5 CONCLUSIONS

In this paper, we discussed a method of generating

virtual data and a method of changing classes in pdi-

Bagging. In addition, we discussed the accuracy of

the generation method of virtual data and the class

change using numerical examples.

ICAART 2023 - 15th International Conference on Agents and Artiﬁcial Intelligence

962

In the future, it is necessary to discuss how to gen-

erate virtual data when there is a bias in the amount

of data between classes, and how to generate virtual

data with directionality. In addition, it is necessary

to discuss the usefulness of pdi-Bagging in practical

applications using actual measurement data.

ACKNOWLEDGEMENTS

This work was partly supported by JST SPRING,

Grant Number JPMJSP2150. In addition, this work

was partly supported by JSTS KAKENHI Grant

Numbers JP20K11981 of the Grant-in-Aid for Sci-

entiﬁc Research(C). This work was also partly sup-

ported by Kansai University Fund for Supporting

Outlay Research Centers, and Kansai University Fund

for Domestic and Overseas Research Fund.

REFERENCES

Breiman, L. (1996). Bagging predictors. Machine Learn-

ing, 24(2):123–140.

Breiman, L. (2001). Random forests. Machine Learning,

45(1):5–32.

Dietterich, T. G. and Bakiri, G. (1995). Solving mul-

ticlass learning problems via error-correcting output

codes. Journal of Artiﬁcial Intelligence Research,

2:263–286.

Freund, Y. and Schapire, R. E. (1997). A decision-theoretic

generalization of on-line learning and an application

to boosting. Journal of Computer and System Sci-

ences, 55(1):119–139.

Friedman, J., Hastie, T., and Tibshirani, R. (2000). Addi-

tive logistic regression: A statistical view of boosting.

Annals of Statistics, 28(2):337–374.

Hayashi, I. and Tsuruse, S. (2010). A proposal of boosting

algorithm for brain-computer interface using proba-

bilistic data interpolation. IEICE Technical Report,

109(461):303–308 (in Japanese).

Hayashi, I., Tsuruse, S., Suzuki, J., and Kozma, R. T.

(2012). A proposal for applying pdi-boosting to brain-

computer interfaces. In Proceedings of 2012 IEEE

International Conference on Fuzzy Systems (FUZZ-

IEEE2012) in 2012 IEEE World Congress on Com-

putational Intelligence (WCCI2012), pages 635–640.

Irie, H. and Hayashi, I. (2019a). Design evaluation of learn-

ing type fuzzy inference using trapezoidal member-

ship function. Journal of Japan Society for Fuzzy

Theory and Intelligent Informatics, 31(6):908–917 (in

Japanese).

Irie, H. and Hayashi, I. (2019b). Performance evaluation

of pdi-bagging by generation of correct - error vir-

tual data. In The 29th Symposium on Fuzzy, Artiﬁ-

cial Intelligence, Neural Networks and Computational

Intelligence(FAN2019), pages Paper ID:No.A3–3 (in

Japanese).

Irie, H. and Hayashi, I. (2020). Proposal of class determina-

tion method for generated virtual data in pdi-bagging.

In The 34th Annual Conference of the Japanese Soci-

ety for Artiﬁcial Intelligence, pages Paper ID:No.103–

GS–8–04 (in Japanese).

Jacobs, R. A., Jordan, M. I., Nowla, S. J., and Hinton, G. E.

(1991). Adaptive mixtures of local experts. Neural

Computation, 3:79—-87.

Nomura, H., Hayashi, I., and Wakami, N. (1991). A self-

tuning method of fuzzy control by descent method.

In The 4th International Fuzzy Systems Association

Congress, Engineering, pages 155–158.

Polikar, R. (2006). Ensemble based systems in deci-

sion making. IEEE Circuits and Systems Magazine,

6(3):21–45.

Rokach, L. (2009). Taxonomy for characterizing ensemble

methods in classiﬁcation tasks: A review and anno-

tated ibliography. Computational Statistics & Data

Analysis, 53(12):4046–4072.

pdi-Bagging: A Proposal of Bagging-Type Ensemble Method Generating Virtual Data

963