relation to the target variable. These low-correlation
features were then dropped from the dataset to opti-
mise the model’s training process. Additionally, to
mitigate multicollinearity issues, the correlation ma-
trix of the remaining features was visualised through
a heatmap, as can be seen in Figure 2, and pairs of
features with high correlation coefficients were iden-
tified using a threshold of
|
0.8
|
to denote high correla-
tion. Finally, the features that exhibit a high correla-
tion with others were removed. This feature selection
and multicollinearity mitigation process aimed to op-
timise the model’s predictive performance and ensure
the robustness of the failure prediction system for in-
dustrial applications.
This step involved dividing the dataset into train-
ing and testing sets using a temporal split. The test-
ing data, comprising 638,486 samples from June 4,
2020, onward, represented approximately 42% of the
dataset. Six machine learning algorithms were eval-
uated to determine the most suitable method for ac-
curate failure prediction: Random Forest, XGBoost,
CatBoost, Gradient Boosting Machine (GBM), Light-
GBM and a voting algorithm. The voting algorithm
was used to combine the strengths of multiple mod-
els, aiming to enhance overall prediction reliability.
Due to their strong overall performance, the four al-
gorithms chosen for the ensemble were XGBoost,
CatBoost, GBM, and LightGBM. The models were
trained using the pre-processed data, and k-fold cross-
validation was implemented to ensure that the model
generalises well to unseen data. Additionally, grid
search was employed to optimise the model param-
eters, enhancing its performance. Recall was used as
the primary metric to evaluate the model’s effective-
ness in accurately identifying failures, ensuring a fo-
cus on minimising false negatives in the predictions.
For the knowledge-driven approach, rules were
defined based on expertise and experience to adjust
the model’s predictions. These rules are crucial for
improving accuracy and handling edge cases, en-
suring that rare but critical scenarios are correctly
addressed. The system applies these rules post-
prediction, refining the initial outputs. The imple-
mented rules are designed to enhance the model’s pre-
dictive accuracy by incorporating domain-specific in-
sights, particularly concerning the median oil temper-
ature and pressure readings. These rules adjust the
model’s initial predictions based on predefined con-
ditions indicative of potential failures. The specific
rules applied are as follows:
• Rule 1: If the initial prediction is 0 (no failure)
but the median oil temperature exceeds 83°C, the
prediction is adjusted to 1 (failure).
• Rule 2: If the initial prediction is 1 (failure) but
the median oil temperature is below 67.25°C, the
prediction is adjusted to 0 (no failure).
• Rule 3: If the initial prediction is 0 (no fail-
ure) and both the median oil temperature exceeds
75.65°C and the median differential pressure ex-
ceeds -0.02 Bar, the prediction is adjusted to 1
(failure).
Subsequently, XAI methods like LIME reveal the
rules and contributions of individual features in pre-
dictions, as illustrated in Figure 3.
A user interface (UI) displays these explanations,
allowing experts to understand the model’s behaviour
behind specific predictions. The visual representa-
tion of feature contributions provides clear insights
into the factors influencing each prediction, making
the model’s decision-making process transparent and
comprehensible.
The system also empowers experts to implement
new rules based on their observations and insights
from the model explanations. These insights can then
be codified into new rules integrated into the system
to refine its predictive accuracy. This continuous feed-
back loop is crucial in enhancing the system’s perfor-
mance. As experts identify and address new scenar-
ios or anomalies, the rules evolve, making the model
more robust over time.
This dynamic interaction between the model and
the experts promotes a proactive maintenance strat-
egy. The system can predict failures more accu-
rately by preemptively adjusting the model based on
real-world observations, reducing downtime and op-
erational costs. This integration of human expertise
with machine learning not only optimises the pre-
dictive model but also ensures that the system re-
mains aligned with the evolving operational context
and complexities of the industrial environment.
The interface also allows experts to submit a fail-
ure report whenever they identify a failure. This re-
porting mechanism is crucial for maintaining the sys-
tem’s accuracy and responsiveness. When a failure re-
port is submitted, the system triggers a retraining pro-
cess for the model. The model adapts and learns from
recent occurrences by incorporating the new failure
data, continuously improving its predictive accuracy
and reducing future prediction errors. This process
not only updates the model with the latest data but
also enhances its ability to recognise similar patterns
and anomalies in the future. The integration of real-
time feedback ensures that the model remains relevant
and effective in an ever-changing operational environ-
ment. This capability is particularly important in in-
dustrial settings, where conditions and failure modes
can evolve rapidly.
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
248