The Impact of the Diversity on Multiple Classifier System

Performance

Identifying Changes in the Amount of Fuel in the Fleet Management System

Rafał Łysiak and Marek Kurzyński

Wroclaw University of Technology, Department of Systems and Computer Networks,

Wyb. Wyspiańskiego 27, 50-370 Wrocław, Poland

Keywords: Multiple Classifier System, Diversity, Imbalanced Data, Random Reference Classifier, Dynamic Ensemble

Selection, Classifier Competence.

Abstract: When it comes to the use of any recognition systems in the real world environment, it turns out that the

reality differs from the theory. There is an assumption that the distribution of the incoming data will be at

least similar to the distribution of the data, which were used during the learning process and that learning

dataset represents the entire space of the problem. In fact, the incoming data differ from the training set and

usually cover only a part of the feature space. Very often we have to deal with imbalanced datasets which

leads to underfitting of classifiers in the final ensemble. In this paper we present the Multiple Classifier

System based on Random Reference Classifier in the problem of fuel level change detection in the fleet

management systems. The ensemble selection process uses probabilistic measures of competence and

diversity at the same time. We compare different methods to determine the diversity within the ensemble.

1 INTRODUCTION

It's hard to imagine today's world without the

information systems and decision support systems to

monitor and support business processes. Each of us

has contact with them every day. These systems are

used in practically every area of industry, i.e.:

supervision over the quality of production, automatic

production lines, all kinds of security systems,

power systems or mobile communication systems.

They are also used in the transportation industry and

automotive. The best known system that supports

the automotive industry is GPS or any other types of

the systems on the vehicle's board , which are

designed to increase safety and comfort, and in

recent years more often, to minimize the costs. The

main component of the costs is generated by the fuel

usage, which directly affects the financial balance of

the transportation companies.

Due to the large amount of the information that

is important for logistics companies within the

regular working day, Fleet Management System

(FMS) become a very important, popular and useful

IT applications. FMS integrates in one place, all the

information that is required to make key business

decisions. As the most important, we should mention

such data as current location of the vehicle, vehicle

status, driver's working time, the status of transport,

the current amount of the fuel, average fuel

consumption or recently famous ecodriving, which

analyzes the driving style to minimize fuel

consumption. FMS provides this information with

the clear and easy to understand user interface, and

recently, more often, makes decisions based on those

information. In this way, the FMS may be able to

increase the profits. For the analysis of the fuel, we

should not forget about environmental issues.

Minimizing the fuel consumption, the CO2

emissions into the atmosphere is reduced as well.

In this paper we will focus on the issue of fuel

analysis, and more specifically on the detection of

the fuel level changes – if refuel or any loss of the

fuel occurred. It should be noted that very often the

fuel from the vehicle is stolen. This phenomenon is

very common especially in the transport market, in

the case of large trucks and tractors, where the

amount of fuel in the tank exceed 1000 liters. The

problem of analysis of changes in the amount of fuel

in the tank becomes more complicated when we

realize the conditions under which measurements are

taken and with what kind of devices. More on this

topic have been written in the Section 2.

Despite the fact that there are ongoing works to

348

Łysiak R. and Kurzy

nski M..

The Impact of the Diversity on Multiple Classiﬁer System Performance - Identifying Changes in the Amount of Fuel in the Fleet Management System.

DOI: 10.5220/0005066703480354

In Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2014), pages 348-354

ISBN: 978-989-758-039-0

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

improve the quality of measurement devices, using

the opportunity that we had to create the whole Fleet

Management System, we decided to go one step

further and focus on the analysis of the final data in

order to decide whether there had been refueling,

fuel theft or maybe the fuel level change was caused

by the regular usage of the vehicle.

Intelligent decision support systems are able to

classify an event based on the feature vector with a

very good accuracy. Unfortunately, using those

systems in the industry is still minor, especially in

the areas where Quality of Service has crucial role,

due to concerns that an unexpected error may be

committed. Wrong decisions in the case of real

systems are mainly due to the lack of appropriate

datasets. Very often, we have to deal with

imbalanced datasets, where the size of dataset that

represents one class of the problem is much bigger

than the other classes. The problem of analysis of

the fuel level in the tank is also such example. It is

very easy to get information about changes in the

level of fuel in typical operating conditions of the

vehicle, but it is difficult to gather learning data that

represent different ways of fuel theft or extreme

operating conditions. In this paper, in order to

increase the accuracy of the decision, two techniques

have been used. The first one is very well known

Multiple Classifiers System idea (MCS) (Dietterich,

2000), (Kuncheva, 2004) that makes decisions based

on the fusion of the outputs from all of the classifiers

in the ensemble. MCS are very strongly developed,

mostly because of the fact that committee, also

known as an ensemble, can outperform its members

(Kittler, et al., 2006). Due to the fact that we are

dealing with imbalanced datasets, diversity measure

was also used. Note that even the best MCS will not

be able to outperform its members if classifiers in

the team are identical. A very important issue is to

increase the diversity between the members in case

of wrong output, while maintaining high accuracy of

individual classifiers in the pool. Furthermore,

diversity measure doesn't bring any benefits if all of

the members in the ensemble have a very good

accuracy.

In this paper, it is shown that the MCS built with

Random Reference Classifier (RRC) (Woloszynski et

al., 2010) used in the analysis of the fuel level

changes can provide very good results. In the first

part of Section 5, it is shown that the RRC behaves

as expected for imbalanced and balanced datasets

(Wang et al., 2013). For this purpose artificially

created datasets were used. The next step was to

verify if the created MCS, constructed with RRC is

able to identify the fuel level changes correctly

based on the actual data. The different types of

diversity measures have been used, both the pairwise

and nonpairwise measures (Kuncheva, 2004) and the

influence on the MCS performance was shown.

The paper is organized as follows. In the section

2 methods of measuring fuel with their advantages

and disadvantages are discussed. In the section 3, the

whole architecture of the IT system that has been

created in order to analyze the data was described. In

the section 4 the exact problem description is

presented. The following sections discuss the

experiments that have been carried out, we present

the results and conclusions that may be noticed. We

also present opportunities for the further research.

1.1 Motivation

It may not be clear why such comprehensive

technique as MCS was used to answer relatively

simple question. It has to be noted that identifying

the typical fuel increase, especially when the fuel

amount change is big, is not a problem. The most

difficult task is to detect a small refueling or

advanced method of fuel stealing. Small refueling

may occur when the vehicle cannot reach the home

station where the cost of the fuel is relatively low. In

such circumstances it may be necessary to detect i.e.

40 liters change in 800 liter tank.

Detecting the fuel stealing is even more

complicated. There are multiple ways to steal the

fuel from the car. It has to be noted that the simplest

method, where filler flap is opened and closed, and

subsequently a fuel decrease is detected is rare. The

very common method is to drain fuel from the fuel

wire into external container, during the 8 hours stop.

In this case the MCS is more like the decision

support system. The system can analyse if the fuel

was stolen or it was used by external fuel device, i.e.

Webasto. This approach can improve the process of

detecting the changes of the fuel, which in most

cases is executed by the FMS system user.

2 FUEL LEVEL MEASUREMENT

METHODS

There are several methods to measure the fuel level

in the tank and its consumption by the vehicle.

Depending on the operating conditions and type of

the vehicle only some of them may be used. Below

the overview of the most popular methods with

information about their advantages and

disadvantages is presented.

TheImpactoftheDiversityonMultipleClassifierSystemPerformance-IdentifyingChangesintheAmountofFuelinthe

FleetManagementSystem

349

2.1 Flowmeter

The most accurate measurement of fuel consumption

can be achieved by the flowmeter which, even with

an accuracy of 1/1000 liter, measures the amount of

fuel that was transported to the engine.

Unfortunately, the flowmeter is not installed in the

tank but in the fuel system, and hence it is not

possible to detect whether refill or fuel theft

occurred. Such decisions can be taken only after the

deep analysis of the data from the long period of

time and compare them to the list of invoices for the

fuel. Installing the additional flowmeter in the

vehicle where it is not pre-installed may be very

complicated and needs a lot of changes in the

vehicle’s fuel system.

2.2 CAN based Analysis

To detect refill and possible loss of the fuel it is

necessary to monitor the level of fuel in the tank.

There are two possible methods to do it. The first

one is the float, which is pre-installed, but the

quality of the measurement is not accurate. A lot of

false changes may occur during the normal operating

of the vehicle. In addition, the float does not gather

any data of the level of the fuel when the engine is

turned off. The information about the fuel level

using the float can be obtained in two ways. You can

either use the CAN bus in the vehicle, which is not

possible in some cases, or use the electrical wires

from the float directly and hook them to the proper

device to analyze the voltage. The undoubted

advantage of the float is the fact that nowadays it is

installed in almost every vehicle.

2.3 Fuel Probes

The alternative solution is to use the additional fuel

probes which provide much more accurate results.

The correct installation of the probe and a suitable

choice of the type of probe allows to measure the

amount of the fuel with high accuracy.

Manufacturers of fuel probes assure measurement

accuracy of 0.5% - 0.1%. However, such good

results may be only obtained in the laboratory

conditions. In the real world we have to deal with

the situations and working conditions, which

increase the error. The examples such as temperature

changes (extra 1.5% - 2% error), natural movement

of the fuel in the tank, because of the operation of

the vehicle (0.5% - 1%) and the tank calibration

error (0.5% to even 3% with the irregular shape of

the tank) should be mentioned. Despite the above

problems, which directly affect the accuracy of the

measurements, fuel probes provide the most accurate

measurements of the fuel level in the tank.

Additionally, there is the possibility to change the

sampling frequency, so it is possible to correct the

signal from the fuel probes using the well-known

signal processing mechanisms. Modification to the

fuel system and vehicle’s electronics during the

installation of the fuel probe is very little. The only

problem may be the limited physical access to the

fuel tank. In case, when the vehicle has more tanks

than one, each of them may be monitored separately.

3 FMS ARCHITECTURE

DESCRIPTION

In this section the architecture of the Fleet

Management System that was created is presented.

Figure 3.1: Fleet Management System Architecture.

The basic element of the system is the recorder

installed in the vehicle. The recorder has multiple

functions. First of all is the device that collects data

from all peripheral devices in the vehicle: on-board

computer, CAN bus, fuel probes, GPS and many

others depends on the purpose of the installation.

There is telemetry SIM card in each recorder, so that

the device has constant access to the Internet using

GPRS connection and is able to send all collected

data to the server. There is a mechanism for

receiving data from the device on the server side,

which decodes the transmitted information and saves

them in a database system. Thereafter there are

numerous modules that are responsible for specific

functionalities of the FMS and one of them is the

analysis of the fuel level on which we focus in this

work. Analysis of the fuel and its changes is divided

into several small subtask to create the well

optimized system that is working in a fast and

efficient way.

It is important to understand that one vehicle

generates on average one frame of data each minute

so it is easy to calculate that the system has to gather

1440 frames per day for only one vehicle. With 100

vehicles in the system there are almost 4.5 million of

records to analyze within a month. In the very

ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics

350

Figure 3.2: Screenshots from the Fleet Management

System. The top image presents the current position of the

vehicles on the OpenStreetMap. The bottom image

presents the plot of the fuel level in one vehicle for the

selected period of the time.

beginning the system rejects all incorrect frames.

Incorrect frames may be created during collecting

the data in the vehicle, because of the problem with

GPS positioning or during transferring data through

the GPRS tunnel. In the next step, all of the

collected frames are aggregated in the events. There

are three types of events: stop at the engine off, stop

at the engine on and drive. For the purposes of this

paper, we focus exclusively on the analysis of

changes in fuel level during the standstill. This is the

time when the vehicle is being refueled, but it is also

the best opportunity for fuel theft. However, it

should be noted, that it is also possible to analyze

fuel changes while driving, so the incorrect

operation of the engine or excessive fuel

consumption may be noticed.

4 PROBLEM DESCRIPTION

In our study, the main problem that needs to be

solved is to obtain the information if the fuel level

change in the tank occurred due to refueling or fuel

theft, or is the result of natural changes due to the

operating condition of the vehicle. The above

description of the problem can be used to define the

binary classification problem. First Class C

represents the real fuel level change in the tank

(refuel or fuel theft), second class C

means there

were no change. It should be noted that MCS does

not specify the type of the change - refuel or fuel

theft - MCS only determines whether the change

actually exists. The type of change is determined

from the sign of the difference between the fuel

amount at the end V

and at the beginning V

of the

analyzed i-th event E

. As previously was described,

the single event should be understood as period of

time in which the vehicle's state was constant.

With that being said, our problem can be

described as follows. The set of events is given

E = {E

, E

, …, E

} and also the pool of the base

classifiers is given L = {L

, L

, …, L

}. Each i-th

event E

in the set E is described by the given vector

of features: event type, amount of the fuel at the

beginning of the event - V

, the amount of the fuel at

the end of the event - V

, the event duration in

minutes - T, the difference in the fuel level F

diff

where:

total

diff



 ,

(4.1)

the status of the fuel filler flap, where 0 means that

the filler flap is closed, and 1 means that the filler

flap is opened, the difference between the maximum

and minimum temperatures during the event - T

and the state of digital inputs which may indicate the

usage of external fuel device. There is the soft

output of the MCS, with supports for both, C

and

class which is the probability of the correct

classification. It should be noted that the maximum

duration of the event is 24 hours.

The base classifiers are Neural Networks. To

determine the support for classes C

and C

the RRC

was used. As it was stated in the work (Lysiak et al.,

2014) supports for both classes will be determined in

the validation points, then using the normalized

Gaussian potential function will be extended to the

entire feature space.

According to what has been written in the section

1, our datasets from the FMS are imbalanced

datasets. The reason of such state was the fact that

there were limited possibilities to verify that the fuel

change was a result of fuel theft. Table 4.1 provides

the information about the datasets that were

provided by the FMS.

Table 4.1: Properties of datasets used for teaching,

validating and testing process in the Multiple Classifiers

System.

Dataset

name

# of

examples

# C

Car 1

1428 37 1391

Car 2

826 60 766

Car 3

568 46 522

Car 4

2415 131 2284

Car 5

576 65 511

It should be noted that each datasets provides the

information about the specific vehicle, because of

the unique character of the measurements. Each

TheImpactoftheDiversityonMultipleClassifierSystemPerformance-IdentifyingChangesintheAmountofFuelinthe

FleetManagementSystem

351

vehicle had other conditions of usage. It is obvious

that the datasets are imbalanced in favor of class C

5 EXPERIMENTS

The classification accuracies (the percentage of

correctly classified objects) is the averaged accuracy

over 20 runs (10 replications of two-fold cross

validation) for both of the experiments. Statistical

differences between the results were evaluated using

Student's t-test (Dietterich, 1988). The level

of p < 0.05 was considered as statistically

significant. The problem of selecting the members of

the ensemble for the MCS that made the decision by

majority voting was solved with the Simulated

Annealing Algorithm (Lysiak et al., 2014)., which

proved to be a fast and providing good results

algorithm. In the following experiments the

modification of the DES-CD

d-opt

algorithm was used

with several different measures of diversity. The

threshold α = 1/2 was used.

Multiple Classifier System with homogeneous

ensemble consisted of 20 Neural Networks (NN) (2

layers with 8 neurons each) has been created. To

prevent overlearning and obtaining diversity

between classifiers, each classifier was trained using

randomly selected 70% of objects from the training

dataset.

5.1 Experiment 1

The first experiment was intended only to show that

the behavior of the RRC, which statistically behaves

as the corresponding base classifier is consistent

with that shown in (Wang, 2013). For this purpose

artificial balanced and imbalanced, two-class

datasets were generated. Table 5.1 provides the

information about the generated datasets.

Table 5.1: Artificial datasets generated for the purpose of

the experiment 1.

Dataset name # of examples # C

400-50

450 400 50

400-400

800 400 400

For the experiment 1, the pairwise diversity measure

from the original DES-CD

d-opt

algorithm was used.

Two different MCS algorithm were compared, the

DES-CD

d-opt

and DES (Woloszynski et al., 2010).

5.2 Experiment 2

In the first experiment the impact of taking into

account the diversity measure for the dynamic

classifiers ensemble selection for artificially

generated imbalanced datasets is shown. In the

second experiment, the accuracy of the MCS, which

uses different methods of determining the diversity

measure (pair- and non-pairwise) of classifiers in the

ensemble was tested. The methods used for

determining the diversity are presented in the Table

5.2.1. All simulations were performed on datasets

that have been presented in the Table 4.1. The

experiment schema was exactly the same as used in

the experiment 1, which is consistent with the

description in the introduction to section 5. In each

test case the modified DES-CS

d-opt

was used. The

difference was only in the manner of determining

the diversity measure.

Table 5.2.1: The information about the selected methods

to determine the diversity measure used in Experiment 2.

Method name Designation

DES-CD

d-opt

(Lysiak et al., 2013)

Correlation

(Kuncheva, 2004)

The Q Statistic

(Kuncheva, 2004)

Generalized Diversity

(Partridge et al., 1997)

6 RESULTS AND DISCUSSION

In this section the results for experiment 1 and 2 are

presented.

6.1 Experiment 1

Below in Table 6.1.1 the results for the experiment 1

are shown and the analysis of them is presented.

Table 6.1.1: Results for the experiment 1.

Dataset

name

DES-CD #1 DES-CD

d-opt

400-50

58,4* 8 69,2* 4

400-400

92,8 16 90,6 7

Based on the results, it is easy to noticed that the

inclusion of the diversity in the process of dynamic

selection of classifiers has a significant impact on

the quality of the output of the MCS when dealing

with imbalanced dataset. It should be also noted that

the improvement of the classification quality have

been obtained with reduced number of classifiers in

the ensemble. The values #1 and #2, presented in the

ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics

352

Table 6.2.1: Results for the experiment 2.

DES A B C D

Car 1

49,8 68,2*

BCD

62,9*

53,3

58,6*

Car 2

48,4 57,9*

55,1* 54,2 53,4

Car 3

52,9 54,8

56,2 55,9*

61,2*

Car 4

46,1 57,2* 55,3* 53,9* 58,6*

Car 5

58,3 65,1*

63,8 66,7*

59,9

Table 6.1.1 are, respectively, the number of

classifiers in the ensemble for DES and DES-CD

d-opt

algorithm. The sign * means that the difference

between algorithms for specific dataset was

statistically significant.

6.2 Experiment 2

Above in the Table 6.2.1 the results for the

experiment 1 are shown and the analysis of them is

presented.

Similarly to the first experiment, also for datasets

from the real-life Fleet Management System, the

impact of the diversity for the quality of

classification of the MCS for imbalanced datasets

may be noticeable. The sign * means that the

difference between the DES and the indicated

algorithm was statistically significant.

The indexes A, B, C and D represent that the result

is statistically different from those of the indicated

modification of the CD-DES algorithm, which takes

into account the method A, B, C or D in order to

determine the diversity measure. 13 out of the 20

results of the modified DES-CD proved to be

significantly better than DES algorithm for

imbalanced datasets. 12 out of the 20 results turned

out to be significantly different than the result of the

other modification of DES-CD algorithm for the

same dataset. Therefore it can be concluded that the

method of calculating the diversity of the ensemble

has a significant impact on the performance of the

analyzed MCS.

7 CONCLUSIONS AND FURTHER

WORK

In this paper, the actual implementation of the

system that supports decision on the determination

of the type of fuel change in vehicle's tank is

presented. It has been shown that the MCS can be

used in the analysis of such kind of changes. It has

been shown that in the case of imbalanced datasets,

the usage of the diversity measure for the dynamic

ensemble selection process has a significant impact

on the quality of the output of the MCS.

The issue described in this paper can be further

investigated. In example, the presented method may

be also used for analysis of the fuel level changes,

based on the data provided by the CAN bus. There

are a lot of other information in the database,

therefore, another feature vector may be used. Our

results also confirm that the various methods of

determining the diversity of classifiers ensemble

have the impact on the output of the MCS.

Therefore, further work on methods for determining

the measure is justified.

ACKNOWLEDGEMENTS

This work was developed under a project

"PRELUDIUM" funded by the National Science

Centre.

REFERENCES

Dietterich, T. G., 2000, Ensemble methods in machine

learning. In J. Kittler and F. Roli, editors, Multiple

Classifier Systems, volume 1857 of Lecture Notes in

Computer Science, Cagliari, Italy, Springer, pp. 1–15.

Kuncheva, L. I., 2004. Combining Pattern Classifiers -

Methods and Algorithms, John Wiley & Sons, Inc.

Kittler, J., Hatef, M., Duin and R., Matas, J., 2006, On

combining classifiers, IEEE Transactions on Pattern

Analysis and Machine Intelligence 20, pp. 226–239.

Woloszynski, T., Kurzynski, M., A measure of

competence based on randomized reference classifier

for dynamic ensemble selection, 20th International

Conference on Pattern Recognition (ICPR), IEEE

Computer.

Wang, S. and Yao, X., 2013, Relationships between

Diversity of Classification Ensembles and Single-

Class Performance Measures, Transactions On

Knowledge And Data Engineering, IEEE, vol. 25, no.

1, pp. 206-218.

Lysiak, R., Kurzynski, M., Woloszynski, T., 2013,

Optimal selection of ensemble classifiers using

measures of competence and diversity of base

classifiers, Neurocomputing Vol. 126 pp. 29-35.

Partridge, D. and Krzanowski, W. J., 1997, Software

diversity: practical statistics for its measurement and

TheImpactoftheDiversityonMultipleClassifierSystemPerformance-IdentifyingChangesintheAmountofFuelinthe

FleetManagementSystem

353

exploitation, Information & Software Technology, 39,

pp. 707–717.

Dietterich, T., 1988, Approximate statistical tests for

comparing supervised classification learning

algorithms, Neural Computation 10 1895-1923.

ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics

354