where the last statement follows from Lemma 1 (see
Appendix) with
∑
m
i=1
c
i
−ˆµ
ˆ
σ
∈ [−m,m]. The extreme
value ±2 is reached if c
1
= ... = c
m
= −c
m+1
=
.. . = −c
2m
, which can be obtained by setting x
1
=
.. . = x
m
= −y
1
= . .. = −y
m
, independently of A
and B as long as A ̸= B and
∑
a
i
∈A
a
i
/∥a
i
∥ ̸= 0 ̸=
∑
b
i
∈B
b
i
/∥b
i
∥.
The proof of Theorem 4 shows that the effect size
reaches its extreme values only if all x ∈ X achieve
the same similarity score s(x,A, B) and s(y,A, B) =
−s(x,A, B) ∀ y ∈ Y , i.e. the smaller the variance
of s(x, A,B) and s(y,A,B) the higher the effect size.
This implies that we can influence the effect size by
changing the variance of s(t,A, B) without changing
whether the groups are separable in the embedding
space. Furthermore, the proof of Theorem 3 shows
that WEAT can report no bias even if the embeddings
contain associations with the bias attributes. This
problem occurs, because WEAT is only sensitive to
the stereotype ”X is associated with A and Y is asso-
ciated with B” and will overlook biases diverting from
this hypothesis.
4.2 Analysis of the Direct Bias
For the Direct Bias, the following theorems show
that it is Magnitude-Comparable, but not Unbiased-
Trustworthy. The proof for Theorem 6 shows that
the first principal component used by the Direct Bias
does not necessarily represent individual bias direc-
tions appropriately. This can lead to both over- and
underestimation of bias by the Direct Bias.
Theorem 5. The DirectBias is Magnitude-
Comparable for c ≥ 0.
Proof. For c ≥ 0 the individual bias |cos(t,g)|
c
is in
[0,1]. Calculating the mean over all targets in T does
not change this bound. The statement follows.
Theorem 6. The DirectBias is not Unbiased-
Trustworthy.
Proof. For the Direct Bias b
0
= 0 indicates no bias.
Consider a setup with two attribute sets A
=
{a
1
,a
2
}
and C = {c
1
,c
2
}.
Using the notation from Section 2.2 this gives us two
defining sets D
1
= {a
1
,c
1
} D
2
= {a
2
,c
2
}. Let a
1
=
(−x,rx)
T
= −c
1
,a
2
= (−x,−rx)
T
= −c
2
and r > 1.
The bias direction is obtained by computing the first
principal component over all (a
i
− µ
i
) and (c
i
− µ
i
)
with µ
i
=
a
i
+c
i
2
= 0. Due to r > 1, b = (0, 1)
T
is a
valid solution for the 1st principal component as it
maximizes the variance
b = argmax
∥v∥=1
∑
i
(v · a
i
)
2
+ (v · c
i
)
2
. (26)
According to the definition in Section 3.1, any
word t = (0,w
y
)
T
would be considered neutral to
groups A and C with s(t,A) = s(t,C) and being
equidistant to each word pair {a
i
,c
i
}.
But with the bias direction b = (0,1)
T
the Direct Bias
would report b
max
= 1 instead of b
0
= 0, which con-
tradicts Definition 3.4.
On the other hand, we would consider a word t =
(w
x
,0)
T
maximally biased, but the Direct Bias would
report no bias. Showing that the bias reported for sin-
gle words t is not Unbiased-Trustworthy, proves that
the DirectBias is not Unbiased-Trustworthy.
5 EXPERIMENTS
In the experiments, we show that the limitations of
WEAT and Direct Bias shown in Section 4 do oc-
cur with state-of-the-art language models. We show
that the effect size of WEAT can be misleading when
comparing bias in different settings. Furthermore,
we highlight how attribute embeddings differ between
different models, which impacts WEAT’s individ-
ual bias, and that the Direct Bias can obtain a mis-
leading bias direction by using the Principal Compo-
nent Analysis (PCA). We use different pretrained lan-
guage models from Huggingface (Wolf et al., 2019)
and the PCA implementation from Scikit-learn (Pe-
dregosa et al., 2011) and observe gender bias based
on 25 attributes per gender, such as (man,woman).
5.1 Weat’s Effect Size
In a first experiment, we demonstrate that the effect
size does not quantify social bias in terms of the sep-
arability of stereotypical targets. We use embeddings
of distilbert-base-uncased and openai-gpt to compute
gender bias according to s(t,A,B) and the effect size
d(X,Y, A,B) for stereotypically male/female job ti-
tles. Figure 1 shows the distribution of s(t,A,B) for
DistilBERT, where stereotypical male/female targets
are clearly distinct based on the sample bias. Figure
2 shows the distribution of s(t, A,B) for GPT, where
stereotypical male and female terms are similarly dis-
tributed. First, we focus on the DistilBERT model
(Figure 1), which clearly is biased with regard to the
tested words. We compare two cases with different
targets, such that the stereotypical target groups are
better separable in one case (left plot), which one may
describe as more severe or more obvious bias com-
pared to the second case, where the target groups are
almost separable (right plot). However, the effect size
behaves contrary to this.
Despite this, when comparing Figures 1 and 2 ,
ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods
164