
cation. They aim to establish a baseline for testing
bias mitigation through regularization while ensur-
ing that the models remain interpretable and efficient.
All models share the following architectures and are
trained with Adam Optimizer.
• Input Layer. Accepts a vector of six features as in-
put: three relevant features (skill, experience, and
education) and three sensitive features (race, gen-
der, and disability status).
• Hidden Layer. A fully connected dense layer
with 10 hidden units, with ReLU (Rectified Lin-
ear Unit).
• Output Layer. A fully connected dense layer with
a single sigmoid output that is interpreted as the
likelihood of the sample belonging to the higher-
skilled job class (class 1). A threshold of 0.5 is
applied to classify samples into class 0 or class 1.
To evaluate generalization and robustness, 10-fold
cross-validation is applied. Training meta-parameters
such as batch size, percentages of training and test-
ing, and learning rate are the same for all the exper-
iments. Each model is trained for a fixed number of
200 epochs in each cross-validation fold. The choice
of 200 epochs is made based on initial experiments,
which suggested that the models achieve stable ac-
curacy and loss convergence; no stop criteria are ap-
plied. For each cross-validation fold, we proceed as
follows:
• The data is partitioned into 90% for training and
10% for validation.
• A new model is initialized and trained from
scratch (for each fold).
• After each fold, the training and validation metrics
are accumulated.
Moreover, Accuracy and Saliency values were
collected for each sensitive and relevant feature for
profiles that match the filter and those that do not.
To better interpret our results, it is important to note
that we are focusing on classification tasks. The pre-
dictions made by our models will be compared to data
obtained through an unfair generation process that as-
signs labels with inherent biases. As a result, com-
paring regularized and non-regularized models (un-
der similar experimental conditions) may show a de-
crease in accuracy for the regularized model. This
decrease may reflect the impact of bias mitigation
efforts, which aim to promote fair decision-making
that differs from the original biased labels assigned
by the unfair data generation mechanism. In our ex-
periments, we directly assessed the impact of regular-
ization on saliency maps. The following paragraphs
provide a detailed report of our analysis and the cor-
responding results.
3.1 Saliency Analysis
We directly assessed the impact of regularization on
saliency maps. In particular, we evaluated the effect
of model type on bias mitigation by estimating the
amount of bias reduction as a dependent variable of
ANOVA models.
To proceed, we initially considered the Null hy-
pothesis that there is no difference in mean saliency
values between sensitive and relevant features of
penalized subjects at a conservative level of 1%
(p-value). The obtained p-value of 0.0139, calculated
using a two-sample t-test to compare accumulated
mean sensitive and mean relevant values from the
non-regularized model (see Tab. 1), was greater
than the conservative threshold; thus returning no
statistical evidence to reject the Null.
Please note that while Algorithm 1 establishes a
linear correlation between relevant features and job
classes (for each profile) and penalizes filtered sub-
jects based on sensitive features, the bias-generating
mechanism, applied here, does not provide statistical
evidence of differing feature importance values when
assigning labels to different job classes, according
to the interpretation of the SM. This aspect offered
a valuable scenario for our estimation: a situation
in which bias perpetuates through neural processes,
where sensitive and relevant features contribute
equally to the unfair decision-making about penal-
ized subjects. In other words, by assuming feature’s
equal contribution to unfair decisions, we reasonably
estimated the amount of bias reduction as explained
by the difference between mean relevant and mean
sensitive feature values in profile matching the filter
when applying regularization.
Following the above considerations, we conducted
an ANOVA with post-hoc test to assess the effect of
model type and estimate the amount of bias reduction
provided by the regularized models. To this aim, we
used the difference between mean relevant and mean
sensitive feature values (referred to as the delta of
saliency in this analysis) as the ANOVA dependent
variable. Delta values were accumulated over folds
for the models considered, and reported in Fig. 1.
Only profiles that match the biased filter are included.
ANOVA in Fig. 2 indicates a statistically significant
effect of the model type on delta values (p < 0.01).
Therefore, we conducted a post hoc test with the fol-
lowing results.
Neural Networks Bias Mitigation Through Fuzzy Logic and Saliency Maps
1347