The Impact of the Diversity on Multiple Classifier System
Performance
Identifying Changes in the Amount of Fuel in the Fleet Management System
Rafał Łysiak and Marek Kurzyński
Wroclaw University of Technology, Department of Systems and Computer Networks,
Wyb. Wyspiańskiego 27, 50-370 Wrocław, Poland
Keywords: Multiple Classifier System, Diversity, Imbalanced Data, Random Reference Classifier, Dynamic Ensemble
Selection, Classifier Competence.
Abstract: When it comes to the use of any recognition systems in the real world environment, it turns out that the
reality differs from the theory. There is an assumption that the distribution of the incoming data will be at
least similar to the distribution of the data, which were used during the learning process and that learning
dataset represents the entire space of the problem. In fact, the incoming data differ from the training set and
usually cover only a part of the feature space. Very often we have to deal with imbalanced datasets which
leads to underfitting of classifiers in the final ensemble. In this paper we present the Multiple Classifier
System based on Random Reference Classifier in the problem of fuel level change detection in the fleet
management systems. The ensemble selection process uses probabilistic measures of competence and
diversity at the same time. We compare different methods to determine the diversity within the ensemble.
1 INTRODUCTION
It's hard to imagine today's world without the
information systems and decision support systems to
monitor and support business processes. Each of us
has contact with them every day. These systems are
used in practically every area of industry, i.e.:
supervision over the quality of production, automatic
production lines, all kinds of security systems,
power systems or mobile communication systems.
They are also used in the transportation industry and
automotive. The best known system that supports
the automotive industry is GPS or any other types of
the systems on the vehicle's board , which are
designed to increase safety and comfort, and in
recent years more often, to minimize the costs. The
main component of the costs is generated by the fuel
usage, which directly affects the financial balance of
the transportation companies.
Due to the large amount of the information that
is important for logistics companies within the
regular working day, Fleet Management System
(FMS) become a very important, popular and useful
IT applications. FMS integrates in one place, all the
information that is required to make key business
decisions. As the most important, we should mention
such data as current location of the vehicle, vehicle
status, driver's working time, the status of transport,
the current amount of the fuel, average fuel
consumption or recently famous ecodriving, which
analyzes the driving style to minimize fuel
consumption. FMS provides this information with
the clear and easy to understand user interface, and
recently, more often, makes decisions based on those
information. In this way, the FMS may be able to
increase the profits. For the analysis of the fuel, we
should not forget about environmental issues.
Minimizing the fuel consumption, the CO2
emissions into the atmosphere is reduced as well.
In this paper we will focus on the issue of fuel
analysis, and more specifically on the detection of
the fuel level changes – if refuel or any loss of the
fuel occurred. It should be noted that very often the
fuel from the vehicle is stolen. This phenomenon is
very common especially in the transport market, in
the case of large trucks and tractors, where the
amount of fuel in the tank exceed 1000 liters. The
problem of analysis of changes in the amount of fuel
in the tank becomes more complicated when we
realize the conditions under which measurements are
taken and with what kind of devices. More on this
topic have been written in the Section 2.
Despite the fact that there are ongoing works to
348
Łysiak R. and Kurzy
´
nski M..
The Impact of the Diversity on Multiple Classifier System Performance - Identifying Changes in the Amount of Fuel in the Fleet Management System.
DOI: 10.5220/0005066703480354
In Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO-2014), pages 348-354
ISBN: 978-989-758-039-0
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
improve the quality of measurement devices, using
the opportunity that we had to create the whole Fleet
Management System, we decided to go one step
further and focus on the analysis of the final data in
order to decide whether there had been refueling,
fuel theft or maybe the fuel level change was caused
by the regular usage of the vehicle.
Intelligent decision support systems are able to
classify an event based on the feature vector with a
very good accuracy. Unfortunately, using those
systems in the industry is still minor, especially in
the areas where Quality of Service has crucial role,
due to concerns that an unexpected error may be
committed. Wrong decisions in the case of real
systems are mainly due to the lack of appropriate
datasets. Very often, we have to deal with
imbalanced datasets, where the size of dataset that
represents one class of the problem is much bigger
than the other classes. The problem of analysis of
the fuel level in the tank is also such example. It is
very easy to get information about changes in the
level of fuel in typical operating conditions of the
vehicle, but it is difficult to gather learning data that
represent different ways of fuel theft or extreme
operating conditions. In this paper, in order to
increase the accuracy of the decision, two techniques
have been used. The first one is very well known
Multiple Classifiers System idea (MCS) (Dietterich,
2000), (Kuncheva, 2004) that makes decisions based
on the fusion of the outputs from all of the classifiers
in the ensemble. MCS are very strongly developed,
mostly because of the fact that committee, also
known as an ensemble, can outperform its members
(Kittler, et al., 2006). Due to the fact that we are
dealing with imbalanced datasets, diversity measure
was also used. Note that even the best MCS will not
be able to outperform its members if classifiers in
the team are identical. A very important issue is to
increase the diversity between the members in case
of wrong output, while maintaining high accuracy of
individual classifiers in the pool. Furthermore,
diversity measure doesn't bring any benefits if all of
the members in the ensemble have a very good
accuracy.
In this paper, it is shown that the MCS built with
Random Reference Classifier (RRC) (Woloszynski et
al., 2010) used in the analysis of the fuel level
changes can provide very good results. In the first
part of Section 5, it is shown that the RRC behaves
as expected for imbalanced and balanced datasets
(Wang et al., 2013). For this purpose artificially
created datasets were used. The next step was to
verify if the created MCS, constructed with RRC is
able to identify the fuel level changes correctly
based on the actual data. The different types of
diversity measures have been used, both the pairwise
and nonpairwise measures (Kuncheva, 2004) and the
influence on the MCS performance was shown.
The paper is organized as follows. In the section
2 methods of measuring fuel with their advantages
and disadvantages are discussed. In the section 3, the
whole architecture of the IT system that has been
created in order to analyze the data was described. In
the section 4 the exact problem description is
presented. The following sections discuss the
experiments that have been carried out, we present
the results and conclusions that may be noticed. We
also present opportunities for the further research.
1.1 Motivation
It may not be clear why such comprehensive
technique as MCS was used to answer relatively
simple question. It has to be noted that identifying
the typical fuel increase, especially when the fuel
amount change is big, is not a problem. The most
difficult task is to detect a small refueling or
advanced method of fuel stealing. Small refueling
may occur when the vehicle cannot reach the home
station where the cost of the fuel is relatively low. In
such circumstances it may be necessary to detect i.e.
40 liters change in 800 liter tank.
Detecting the fuel stealing is even more
complicated. There are multiple ways to steal the
fuel from the car. It has to be noted that the simplest
method, where filler flap is opened and closed, and
subsequently a fuel decrease is detected is rare. The
very common method is to drain fuel from the fuel
wire into external container, during the 8 hours stop.
In this case the MCS is more like the decision
support system. The system can analyse if the fuel
was stolen or it was used by external fuel device, i.e.
Webasto. This approach can improve the process of
detecting the changes of the fuel, which in most
cases is executed by the FMS system user.
2 FUEL LEVEL MEASUREMENT
METHODS
There are several methods to measure the fuel level
in the tank and its consumption by the vehicle.
Depending on the operating conditions and type of
the vehicle only some of them may be used. Below
the overview of the most popular methods with
information about their advantages and
disadvantages is presented.
TheImpactoftheDiversityonMultipleClassifierSystemPerformance-IdentifyingChangesintheAmountofFuelinthe
FleetManagementSystem
349
2.1 Flowmeter
The most accurate measurement of fuel consumption
can be achieved by the flowmeter which, even with
an accuracy of 1/1000 liter, measures the amount of
fuel that was transported to the engine.
Unfortunately, the flowmeter is not installed in the
tank but in the fuel system, and hence it is not
possible to detect whether refill or fuel theft
occurred. Such decisions can be taken only after the
deep analysis of the data from the long period of
time and compare them to the list of invoices for the
fuel. Installing the additional flowmeter in the
vehicle where it is not pre-installed may be very
complicated and needs a lot of changes in the
vehicle’s fuel system.
2.2 CAN based Analysis
To detect refill and possible loss of the fuel it is
necessary to monitor the level of fuel in the tank.
There are two possible methods to do it. The first
one is the float, which is pre-installed, but the
quality of the measurement is not accurate. A lot of
false changes may occur during the normal operating
of the vehicle. In addition, the float does not gather
any data of the level of the fuel when the engine is
turned off. The information about the fuel level
using the float can be obtained in two ways. You can
either use the CAN bus in the vehicle, which is not
possible in some cases, or use the electrical wires
from the float directly and hook them to the proper
device to analyze the voltage. The undoubted
advantage of the float is the fact that nowadays it is
installed in almost every vehicle.
2.3 Fuel Probes
The alternative solution is to use the additional fuel
probes which provide much more accurate results.
The correct installation of the probe and a suitable
choice of the type of probe allows to measure the
amount of the fuel with high accuracy.
Manufacturers of fuel probes assure measurement
accuracy of 0.5% - 0.1%. However, such good
results may be only obtained in the laboratory
conditions. In the real world we have to deal with
the situations and working conditions, which
increase the error. The examples such as temperature
changes (extra 1.5% - 2% error), natural movement
of the fuel in the tank, because of the operation of
the vehicle (0.5% - 1%) and the tank calibration
error (0.5% to even 3% with the irregular shape of
the tank) should be mentioned. Despite the above
problems, which directly affect the accuracy of the
measurements, fuel probes provide the most accurate
measurements of the fuel level in the tank.
Additionally, there is the possibility to change the
sampling frequency, so it is possible to correct the
signal from the fuel probes using the well-known
signal processing mechanisms. Modification to the
fuel system and vehicle’s electronics during the
installation of the fuel probe is very little. The only
problem may be the limited physical access to the
fuel tank. In case, when the vehicle has more tanks
than one, each of them may be monitored separately.
3 FMS ARCHITECTURE
DESCRIPTION
In this section the architecture of the Fleet
Management System that was created is presented.
Figure 3.1: Fleet Management System Architecture.
The basic element of the system is the recorder
installed in the vehicle. The recorder has multiple
functions. First of all is the device that collects data
from all peripheral devices in the vehicle: on-board
computer, CAN bus, fuel probes, GPS and many
others depends on the purpose of the installation.
There is telemetry SIM card in each recorder, so that
the device has constant access to the Internet using
GPRS connection and is able to send all collected
data to the server. There is a mechanism for
receiving data from the device on the server side,
which decodes the transmitted information and saves
them in a database system. Thereafter there are
numerous modules that are responsible for specific
functionalities of the FMS and one of them is the
analysis of the fuel level on which we focus in this
work. Analysis of the fuel and its changes is divided
into several small subtask to create the well
optimized system that is working in a fast and
efficient way.
It is important to understand that one vehicle
generates on average one frame of data each minute
so it is easy to calculate that the system has to gather
1440 frames per day for only one vehicle. With 100
vehicles in the system there are almost 4.5 million of
records to analyze within a month. In the very
ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics
350
Figure 3.2: Screenshots from the Fleet Management
System. The top image presents the current position of the
vehicles on the OpenStreetMap. The bottom image
presents the plot of the fuel level in one vehicle for the
selected period of the time.
beginning the system rejects all incorrect frames.
Incorrect frames may be created during collecting
the data in the vehicle, because of the problem with
GPS positioning or during transferring data through
the GPRS tunnel. In the next step, all of the
collected frames are aggregated in the events. There
are three types of events: stop at the engine off, stop
at the engine on and drive. For the purposes of this
paper, we focus exclusively on the analysis of
changes in fuel level during the standstill. This is the
time when the vehicle is being refueled, but it is also
the best opportunity for fuel theft. However, it
should be noted, that it is also possible to analyze
fuel changes while driving, so the incorrect
operation of the engine or excessive fuel
consumption may be noticed.
4 PROBLEM DESCRIPTION
In our study, the main problem that needs to be
solved is to obtain the information if the fuel level
change in the tank occurred due to refueling or fuel
theft, or is the result of natural changes due to the
operating condition of the vehicle. The above
description of the problem can be used to define the
binary classification problem. First Class C
1
represents the real fuel level change in the tank
(refuel or fuel theft), second class C
2
means there
were no change. It should be noted that MCS does
not specify the type of the change - refuel or fuel
theft - MCS only determines whether the change
actually exists. The type of change is determined
from the sign of the difference between the fuel
amount at the end V
E
and at the beginning V
B
of the
analyzed i-th event E
i
. As previously was described,
the single event should be understood as period of
time in which the vehicle's state was constant.
With that being said, our problem can be
described as follows. The set of events is given
E = {E
1
, E
2
, E
3
, …, E
n
} and also the pool of the base
classifiers is given L = {L
1
, L
2
, L
3
, …, L
m
}. Each i-th
event E
i
in the set E is described by the given vector
of features: event type, amount of the fuel at the
beginning of the event - V
B
, the amount of the fuel at
the end of the event - V
E
, the event duration in
minutes - T, the difference in the fuel level F
diff
,
where:
V
VV
F
total
BE
diff
,
(4.1)
the status of the fuel filler flap, where 0 means that
the filler flap is closed, and 1 means that the filler
flap is opened, the difference between the maximum
and minimum temperatures during the event - T
A
,
and the state of digital inputs which may indicate the
usage of external fuel device. There is the soft
output of the MCS, with supports for both, C
1
and
C
2
class which is the probability of the correct
classification. It should be noted that the maximum
duration of the event is 24 hours.
The base classifiers are Neural Networks. To
determine the support for classes C
1
and C
2
the RRC
was used. As it was stated in the work (Lysiak et al.,
2014) supports for both classes will be determined in
the validation points, then using the normalized
Gaussian potential function will be extended to the
entire feature space.
According to what has been written in the section
1, our datasets from the FMS are imbalanced
datasets. The reason of such state was the fact that
there were limited possibilities to verify that the fuel
change was a result of fuel theft. Table 4.1 provides
the information about the datasets that were
provided by the FMS.
Table 4.1: Properties of datasets used for teaching,
validating and testing process in the Multiple Classifiers
System.
Dataset
name
# of
examples
# C
1
#C
2
Car 1
1428 37 1391
Car 2
826 60 766
Car 3
568 46 522
Car 4
2415 131 2284
Car 5
576 65 511
It should be noted that each datasets provides the
information about the specific vehicle, because of
the unique character of the measurements. Each
TheImpactoftheDiversityonMultipleClassifierSystemPerformance-IdentifyingChangesintheAmountofFuelinthe
FleetManagementSystem
351
vehicle had other conditions of usage. It is obvious
that the datasets are imbalanced in favor of class C
2
.
5 EXPERIMENTS
The classification accuracies (the percentage of
correctly classified objects) is the averaged accuracy
over 20 runs (10 replications of two-fold cross
validation) for both of the experiments. Statistical
differences between the results were evaluated using
Student's t-test (Dietterich, 1988). The level
of p < 0.05 was considered as statistically
significant. The problem of selecting the members of
the ensemble for the MCS that made the decision by
majority voting was solved with the Simulated
Annealing Algorithm (Lysiak et al., 2014)., which
proved to be a fast and providing good results
algorithm. In the following experiments the
modification of the DES-CD
d-opt
algorithm was used
with several different measures of diversity. The
threshold α = 1/2 was used.
Multiple Classifier System with homogeneous
ensemble consisted of 20 Neural Networks (NN) (2
layers with 8 neurons each) has been created. To
prevent overlearning and obtaining diversity
between classifiers, each classifier was trained using
randomly selected 70% of objects from the training
dataset.
5.1 Experiment 1
The first experiment was intended only to show that
the behavior of the RRC, which statistically behaves
as the corresponding base classifier is consistent
with that shown in (Wang, 2013). For this purpose
artificial balanced and imbalanced, two-class
datasets were generated. Table 5.1 provides the
information about the generated datasets.
Table 5.1: Artificial datasets generated for the purpose of
the experiment 1.
Dataset name # of examples # C
1
#C
2
400-50
450 400 50
400-400
800 400 400
For the experiment 1, the pairwise diversity measure
from the original DES-CD
d-opt
algorithm was used.
Two different MCS algorithm were compared, the
DES-CD
d-opt
and DES (Woloszynski et al., 2010).
5.2 Experiment 2
In the first experiment the impact of taking into
account the diversity measure for the dynamic
classifiers ensemble selection for artificially
generated imbalanced datasets is shown. In the
second experiment, the accuracy of the MCS, which
uses different methods of determining the diversity
measure (pair- and non-pairwise) of classifiers in the
ensemble was tested. The methods used for
determining the diversity are presented in the Table
5.2.1. All simulations were performed on datasets
that have been presented in the Table 4.1. The
experiment schema was exactly the same as used in
the experiment 1, which is consistent with the
description in the introduction to section 5. In each
test case the modified DES-CS
d-opt
was used. The
difference was only in the manner of determining
the diversity measure.
Table 5.2.1: The information about the selected methods
to determine the diversity measure used in Experiment 2.
Method name Designation
DES-CD
d-opt
(Lysiak et al., 2013)
A
Correlation
(Kuncheva, 2004)
B
The Q Statistic
(Kuncheva, 2004)
C
Generalized Diversity
(Partridge et al., 1997)
D
6 RESULTS AND DISCUSSION
In this section the results for experiment 1 and 2 are
presented.
6.1 Experiment 1
Below in Table 6.1.1 the results for the experiment 1
are shown and the analysis of them is presented.
Table 6.1.1: Results for the experiment 1.
Dataset
name
DES-CD #1 DES-CD
d-opt
#2
400-50
58,4* 8 69,2* 4
400-400
92,8 16 90,6 7
Based on the results, it is easy to noticed that the
inclusion of the diversity in the process of dynamic
selection of classifiers has a significant impact on
the quality of the output of the MCS when dealing
with imbalanced dataset. It should be also noted that
the improvement of the classification quality have
been obtained with reduced number of classifiers in
the ensemble. The values #1 and #2, presented in the
ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics
352
Table 6.2.1: Results for the experiment 2.
DES A B C D
Car 1
49,8 68,2*
BCD
62,9*
AC
53,3
AB
58,6*
A
Car 2
48,4 57,9*
D
55,1* 54,2 53,4
A
Car 3
52,9 54,8
D
56,2 55,9*
D
61,2*
A
Car 4
46,1 57,2* 55,3* 53,9* 58,6*
Car 5
58,3 65,1*
D
63,8 66,7*
D
59,9
AC
Table 6.1.1 are, respectively, the number of
classifiers in the ensemble for DES and DES-CD
d-opt
algorithm. The sign * means that the difference
between algorithms for specific dataset was
statistically significant.
6.2 Experiment 2
Above in the Table 6.2.1 the results for the
experiment 1 are shown and the analysis of them is
presented.
Similarly to the first experiment, also for datasets
from the real-life Fleet Management System, the
impact of the diversity for the quality of
classification of the MCS for imbalanced datasets
may be noticeable. The sign * means that the
difference between the DES and the indicated
algorithm was statistically significant.
The indexes A, B, C and D represent that the result
is statistically different from those of the indicated
modification of the CD-DES algorithm, which takes
into account the method A, B, C or D in order to
determine the diversity measure. 13 out of the 20
results of the modified DES-CD proved to be
significantly better than DES algorithm for
imbalanced datasets. 12 out of the 20 results turned
out to be significantly different than the result of the
other modification of DES-CD algorithm for the
same dataset. Therefore it can be concluded that the
method of calculating the diversity of the ensemble
has a significant impact on the performance of the
analyzed MCS.
7 CONCLUSIONS AND FURTHER
WORK
In this paper, the actual implementation of the
system that supports decision on the determination
of the type of fuel change in vehicle's tank is
presented. It has been shown that the MCS can be
used in the analysis of such kind of changes. It has
been shown that in the case of imbalanced datasets,
the usage of the diversity measure for the dynamic
ensemble selection process has a significant impact
on the quality of the output of the MCS.
The issue described in this paper can be further
investigated. In example, the presented method may
be also used for analysis of the fuel level changes,
based on the data provided by the CAN bus. There
are a lot of other information in the database,
therefore, another feature vector may be used. Our
results also confirm that the various methods of
determining the diversity of classifiers ensemble
have the impact on the output of the MCS.
Therefore, further work on methods for determining
the measure is justified.
ACKNOWLEDGEMENTS
This work was developed under a project
"PRELUDIUM" funded by the National Science
Centre.
REFERENCES
Dietterich, T. G., 2000, Ensemble methods in machine
learning. In J. Kittler and F. Roli, editors, Multiple
Classifier Systems, volume 1857 of Lecture Notes in
Computer Science, Cagliari, Italy, Springer, pp. 1–15.
Kuncheva, L. I., 2004. Combining Pattern Classifiers -
Methods and Algorithms, John Wiley & Sons, Inc.
Kittler, J., Hatef, M., Duin and R., Matas, J., 2006, On
combining classifiers, IEEE Transactions on Pattern
Analysis and Machine Intelligence 20, pp. 226–239.
Woloszynski, T., Kurzynski, M., A measure of
competence based on randomized reference classifier
for dynamic ensemble selection, 20th International
Conference on Pattern Recognition (ICPR), IEEE
Computer.
Wang, S. and Yao, X., 2013, Relationships between
Diversity of Classification Ensembles and Single-
Class Performance Measures, Transactions On
Knowledge And Data Engineering, IEEE, vol. 25, no.
1, pp. 206-218.
Lysiak, R., Kurzynski, M., Woloszynski, T., 2013,
Optimal selection of ensemble classifiers using
measures of competence and diversity of base
classifiers, Neurocomputing Vol. 126 pp. 29-35.
Partridge, D. and Krzanowski, W. J., 1997, Software
diversity: practical statistics for its measurement and
TheImpactoftheDiversityonMultipleClassifierSystemPerformance-IdentifyingChangesintheAmountofFuelinthe
FleetManagementSystem
353
exploitation, Information & Software Technology, 39,
pp. 707–717.
Dietterich, T., 1988, Approximate statistical tests for
comparing supervised classification learning
algorithms, Neural Computation 10 1895-1923.
ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics
354