only historical machine logs and production indica-
tors. In particular, the approach enables health mon-
itoring and the prediction of failures on older equip-
ment without the need of install new sensors. Those
attributes permit companies to reduce costs buying
new monitoring components and speeds the process
of analyzing the machine behavior and deploying the
predictive solution, because the monitoring system
(of log and production data) was on the machine since
the beginning of its operation generating historical
data.
The paper is organized as follows: in Section 2,
the authors introduce a literature review about predic-
tive maintenance and fault diagnosis. Section 3 de-
scribes briefly the scenario in which the application
is described as well as the data available for it. Sec-
tion 4 presents the core solution presented in this pa-
per. Finally, Section 5 shows the results and Section 6
concludes the paper by summarizing and discussing
the work.
2 BACKGROUND
Nowadays modern machines are able to monitor a
large set of parameters, variables or indicators. The
production data is useful to build analytical solutions,
such as decision support systems or predictive main-
tenance solutions (Rosaria et al., 2021).
Among the operational data, machine failures and
alarms are some of the data sources most common in
the shop-floor. In fact, the PLCs continuously pro-
duce this log information about the machine, includ-
ing also internal events, warnings, alarms, errors, ma-
chine or components status or cycles. Logs are gen-
erated automatically at a very high rate, daily, hourly,
and contains timestamps about the information that is
reported. These log data can be stored into databases
or files, providing valuable information for machine
diagnostics (Xiang et al., 2018). Those diagnostics
algorithms can include degradation models or log-
based predictive maintenance (Gutschi et al., 2019),
(Wang et al., 2017). Despite of the structure of the
log file, managing these information can be an impor-
tant for extracting information about different aspect
of the machine production. As it is possible to see
in Section 3 log files can be also be involved in the
failure prediction.
3 DATASET
The data used for this work is from a woodworking
drilling machine described in detail in the subsection
3.1. That machine generates two different types of
data, 1) event log data, and 2) production data, which
are described in the subsection 3.2.
3.1 Scenario
The machine of interest is a woodworking drilling
machine (Brema VEKTOR15), composed of a set of
drill bits, divided into two spindles. The total number
of different drills is about 40/50. The life, in hours,
of a drill bit depends on multiple factors, such as the
hardness and wood quality. The quality of the mate-
rial depends on the suppliers and on what is indicated
in the specifications of the purchased wood. For in-
stance, the percentage of presence of metal residues
in the chipboards.
The shape of the drilled hole and the noise emit-
ted by the saw in case of cutting are good indicators
about the health of these tools. Due to the difficulty in
getting these measurements from the machinery, nor-
mally the drill bits are substituted or at regular inter-
vals or thanks to the operator’s experience.
3.2 Exploratory Data Analysis
The dataset used to design the pipeline is composed
of two parts: 1) the production data and 2) the log
data. The first one contains all the articles produced,
and the second one all the events occurred in the ma-
chine. The extensions of those documents are .ter
and .btk, which are a particular type of text files, ex-
ported/generated by the machine.
3.2.1 Production Data
The production dataset, in Figure 1a, contains all the
pieces of wood worked in a particular time interval.
The description of the columns is the following: 1)
”Programma”, file that contains all the drilling opera-
tions that must be made on the piece, 2) ”Commento”,
details about the drilling, 3) dimensions of the board,
L for length, H for height, and S for width, and 4)
starting and ending time of the two working phases
(Start1, End1, Start2, End2).
One of the goals of this preliminary statistical
analysis is to evaluate to what degree of the working
time is influenced by the material (type of wood such
as poplar, ebony, walnut, etc.), the dimensions of the
board and the number of drills. A plausible starting
point is represented by the computation of derivative
variables like the volume, which integrates together
the length, the height and the width, and the time in-
tervals T1, T2 and INT. Instead, the value of T1 is the
difference, in seconds, between End1 and Start1, T2
Feature Extraction and Failure Detection Pipeline Applied to Log-based and Production Data
321