Semantic Properties of Cosine Based Bias Scores for Word Embeddings

Sarah Schr

oder

, Alexander Schulz

, Fabian Hinder

and Barbara Hammer

Machine Learning Group, Bielefeld University, Bielefeld, Germany

Keywords:

Language Models, Word Embeddings, Social Bias.

Abstract:

Plenty of works have brought social biases in language models to attention and proposed methods to detect

such biases. As a result, the literature contains a great deal of different bias tests and scores, each introduced

with the premise to uncover yet more biases that other scores fail to detect. What severely lacks in the literature,

however, are comparative studies that analyse such bias scores and help researchers to understand the beneﬁts

or limitations of the existing methods. In this work, we aim to close this gap for cosine based bias scores.

By building on a geometric deﬁnition of bias, we propose requirements for bias scores to be considered

meaningful for quantifying biases. Furthermore, we formally analyze cosine based scores from the literature

with regard to these requirements. We underline these ﬁndings with experiments to show that the bias scores’

limitations have an impact in the application case.

1 INTRODUCTION

In the domain of Natural Language Processing (NLP),

many works have investigated social biases in terms

of associations in the embeddings space. Early works

(Bolukbasi et al., 2016; Caliskan et al., 2017) intro-

duced methods to measure and mitigate social biases

based on cosine similarity in word embeddigs. With

NLP research progressing to large language mod-

els and contextualized embeddings, doubts have been

raised whether these methods are still suitable for fair-

ness evaluation (May et al., 2019) and other works

criticize that for instance the Word Embedding As-

sociation Test (WEAT) (Caliskan et al., 2017) fails

to detect some kinds of biases (Gonen and Goldberg,

2019; Ethayarajh et al., 2019). Overall there exists a

great deal of bias measures in the literature, which not

necessarily detect the same biases (Kurita et al., 2019;

Gonen and Goldberg, 2019; Ethayarajh et al., 2019).

In general, researchers are questioning the usability of

model intrinsic bias measures, such as cosine based

methods (Steed et al., 2022; Goldfarb-Tarrant et al.,

2020; Kaneko et al., 2022). There exist few papers

that compare the performance of different bias scores

(Delobelle et al., 2021; Schr

oder et al., 2023) and

works that evaluate experimental setups for bias mea-

https://orcid.org/0000-0002-7954-3133

https://orcid.org/0000-0002-0739-612X

https://orcid.org/0000-0002-1199-4085

https://orcid.org/0000-0002-0935-5591

surement (Seshadri et al., 2022). However, to our

knowledge, only two works investigate the properties

of intrinsic bias scores on a theoretical level (Etha-

yarajh et al., 2019; Du et al., 2021). To further close

this gap, we evaluate the semantic properties of co-

sine based bias scores, focusing on bias quantiﬁcation

as opposed to bias detection. We make the following

contributions: (i) We formalize the properties of trust-

worthiness and comparability as requirements for co-

sine based bias scores. (ii) We analyze WEAT and the

Direct Bias, two prominent examples from the liter-

ature. (iii) We conduct experiments to highlight the

behavior of WEAT and the Direct Bias in practice.

Both our theoretical analysis and experiments

show limitations of these bias scores in terms of bias

quantiﬁcation. It is crucial that researchers take these

limitations into account when considering WEAT or

the Direct Bias for their works. Furthermore, we lay

the ground work to analyze other cosine based bias

scores and understand how they can be useful for the

fairness literature. The paper is structured as follows:

In Section 2 we summarize WEAT, the Direct Bias

and general terminology for cosine based bias mea-

sures from the literature. We introduce formal re-

quirements for such bias scores in Section 3 and ana-

lyze WEAT and the Direct Bias in terms of these re-

quirements in Section 4. In Section 5 we support our

theoretical ﬁndings by experiments, before drawing

our conclusions in Section 6.

160

Schröder, S., Schulz, A., Hinder, F. and Hammer, B.

Semantic Properties of Cosine Based Bias Scores for Word Embeddings.

DOI: 10.5220/0012577200003654

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2024), pages 160-168

ISBN: 978-989-758-684-2; ISSN: 2184-4313

2 RELATED WORK FOR BIAS IN

WORLD EMBEDDINGS

2.1 WEAT

The Word Embedding Association Test, short WEAT,

(Caliskan et al., 2017), is a statistical test for stereo-

types in word embeddings. The test compares two

sets of target words X and Y with two sets of bias at-

tributes A and B of equal size n under the hypothesis

that words in X are rather associated with words in A

and words in Y rather associated with words in B. The

association of a single word w with the bias attribute

sets A and B including n attributes each, is given by

s(w,A,B) =

∑

a∈A

cos(w,a) −

∑

b∈B

cos(w,b). (1)

To measure bias in the sets X and Y, the effect size is

used, which is a normalized measure for the associa-

tion difference between the target sets

d(X,Y,A,B) =

∑

x∈X

s(x,A,B) −

∑

y∈Y

s(y,A,B)

stddev

w∈X∪Y

s(w,A,B)

. (2)

A positive effect size conﬁrms the hypothesis that

words in X are rather stereotypical for the attributes

in A and words in Y stereotypical for words in B,

while a negative effect size indicates that the stereo-

types would be counter-wise. To determine if the ef-

fect is indeed statistically signiﬁcant, the permutation

test

p = P

[s(X

,A, B) > s(X ,Y,A,B)]. (3)

with subsets (X

) of X ∪Y and the test statistic

s(X,Y,A,B) =

∑

x∈X

s(x,A, B) −

∑

y∈Y

s(y,A, B)(4)

is done. As a statistical test WEAT is suited to conﬁrm

a hypothesis (such that a certain type of stereotype

exists in a model), but it cannot prove the opposite.

2.2 Direct Bias

The Direct Bias (Bolukbasi et al., 2016) is deﬁned as

the correlation of neutral words w ∈ W with a bias

direction (for example gender direction g):

DirectBias(W ) :=

|W |

∑

w∈W

|cos(w, g)|

(5)

with c determining the strictness of bias measure-

ment. The gender direction is either obtained by a

gender word-pair e.g. g = he −she or - to get a more

robust estimate - it is obtained by computing the ﬁrst

principal component over a set of individual gender

directions from different word-pairs.

In terms of their debiasing algorithm the authors

describe how to obtain a bias subspace given deﬁning

sets D

, ..., D

. A deﬁning set D

includes words w

that only differ by the bias relevant topic e.g. for gen-

der bias {man,woman} could be used as a deﬁning

set. Given these sets, the authors construct individ-

ual bias directions w − µ

∀w ∈ D

,i ∈ {1,...,n} and

∑

w∈D

. To obtain a k-dimensional bias sub-

space B they compute the k ﬁrst principal components

over these samples.

2.3 Terminology

In the literature geometrical bias is measured by com-

paring neutral targets against sensitive attributes. By

targets and attributes we refer to vector representa-

tions of words, sentences or text in a d-dimensional

embedding space. However, the methodology can be

applied to any kind of vector representations. While

the exact notation varies between publications, we

summarize and use it in the following Sections as fol-

lows:

Given a protected attribute like gender or race, we

select n ≥ 2 protected groups that might be subject

to biases. Each protected group is deﬁned by a set of

attributes a

∈ A

with i ∈ {1,..., n} the group’s index.

We summarize these attribute sets as A = {A

,..., A

The intuition is that the attributes deﬁne the relation of

protected groups by contrasting speciﬁcally over the

membership to the different groups. Therefore, it is

important that any attribute a

∈ A

has a counterpart

∈ A

∀ A

∈ A, j ̸= i that only differs from a

the group membership. For instance, if we used A

{she, f emale,woman} as a selection of female terms,

= {he,male,man} would be the proper choice of

male terms.

Analogously to WEAT’s deﬁnition of word biases,

we deﬁne the association of a target t with one pro-

tected group, represented by A

, as

s(t,A

) =

∑

∈A

cos(t.a

) (6)

A similar notion is found with the Direct Bias (Boluk-

basi et al., 2016). To detect bias, one would con-

sider the difference of associations towards the dif-

ferent groups, i.e. is t more similar to one protected

group than the others. This concept is also found in

most cosine based bias scores.

Whether such association differences are harmful de-

pends on whether t is theoretically neutral to the pro-

tected groups. For example, terms like ”aunt” or ”un-

cle” are associated with one or the other gender per

deﬁnition, while a term like ”nurse” should not be as-

sociated with gender.

Semantic Properties of Cosine Based Bias Scores for Word Embeddings

161

3 FORMAL REQUIREMENTS

FOR BIAS SCORES

3.1 Formal Bias Deﬁnition and

Notations

As baseline for our bias score requirements and the

following analysis of bias scores from the literature,

we suggest two intuitive deﬁnitions of individual bias

for target samples (e.g. one word) t and aggregated

biases for sets of targets T . For samples t we apply

the intuition of WEAT, extended to n protected groups

instead of only two.

Deﬁnition 3.1 (Individual Bias). Given n protected

groups represented by attribute sets A

, ..., A

and a

target t that is theoretically neutral to these groups,

we consider t biased if

∃A

∈ A : s(t,A

) > s(t, A

) (7)

Deﬁnition 3.2 (Aggregated Bias). Given n protected

groups represented by attribute sets A

, ..., A

and a

set of targets T containing only samples that are theo-

retically neutral to these groups, we consider T biased

if at least one sample t ∈ T is biased:

∃A

∈ A,t ∈ T : s(t,A

) > s(t, A

) (8)

The idea behind Deﬁnition 3.2 is that even when

looking at aggregated biases, each individual bias is

important, i.e. as long as there is one biased target in

the set, we cannot call the set unbiased, even if target

biases cancel out on average or the majority of targets

is unbiased.

In the following we will use a notation for bias

score functions in general: b(t,A) measuring the bias

of one target and b(T,A) for aggregated biases. Note

that there are two different strategies in the liter-

ature: Bias scores measuring bias over all neutral

words jointly (Direct Bias), which matches our nota-

tion b(T, A), and bias scores measuring the bias over

two groups of neutral words X,Y ⊂ T (WEAT). In

the later case, we consider the selection of subsets

X,Y ⊂ T as part of the bias score and thus treat it

as a function b(T,A).

Since the bias scores from the literature have dif-

ferent extreme values and different values indicating

no bias, we use the following notations: b

min

and b

max

are the extreme values of b(·) and b

is the value of

b(·) that means t or T is unbiased. Note that b

min

and

are not necessarily equal.

3.2 Requirements for Bias Metrics

Based on the deﬁnitions of bias explained in Sec-

tion 3.1, we formalize the properties of trustworthi-

ness and magnitude-comparability. The goal of both

properties is to ensure that biases can be quantiﬁed in

a way such that bias scores can be safely compared

between different embedding models and debiasing

methods can be evaluated without risking to overlook

bias.

3.2.1 Comparability

The goal of magnitude-comparability, is to ensure

that bias scores are comparable between embeddings

of different models. This is necessary to make state-

ments about embedding models being more or less bi-

ased than others, which includes comparing debiased

embeddings with their original counterparts. We ﬁnd

a necessary condition for such comparability is the

possibility to reach the extreme values b

min

and b

max

of b(·) in different embedding spaces depending only

on the neutral targets and their relation to attribute

vectors, as opposed to the attribute vectors them-

selves, which might be embedded differently given

different models.

Deﬁnition 3.3 (Magnitude-Comparable). We call the

bias score function b(T,A) Magnitude-Comparable if,

for a ﬁxed number of target samples in set T (includ-

ing the case T = {t}), the maximum bias score b

max

and the minimum bias score b

min

are independent of

the attribute sets in A:

max

T,|T |=const

b(T,A) = b

max

∀ A, (9)

min

T,|T |=const

b(T,A) = b

min

∀ A. (10)

3.2.2 Trustworthiness

The second property of trustworthiness deﬁnes

whether we can trust a bias score to report any bias

in accordance to Deﬁnitions 3.1 and 3.2, i.e. the

bias score can only reach b

, which indicates fairness,

if the observed target is equidistant to all protected

groups and for target sets if all samples in the ob-

served set of targets are unbiased. This is important,

because even if a set of targets is mostly unbiased or

target biases cancel out on average, individual biases

can still be harmful and should thus be detected. The

requirement for the consistency of the minimal bias

score b

can be formulated in a straight forward way

using the similarities to the attribute sets A

Deﬁnition 3.4 (Unbiased-Trustworthy). Let b

be the

bias score of a bias score function, that is equivalent

to no bias being measured. We call the bias score

function b(t,A) Unbiased-Trustworthy if

b(t,A) = b

⇐⇒ s(t,A

) = s(t, A

) ∀ A

∈ A.

(11)

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

162

Analogously for aggregated scores with a set T =

,..., t

}, we say b(T,A) is Unbiased-Trustworthy

b(T,A) = b

(12)

⇐⇒ s(t

) = s(t

) ∀ A

∈ A,k ∈ {1,..., m}.

(13)

4 ANALYSIS OF BIAS SCORES

As a major contribution of this work, we formally an-

alyze WEAT and the Direct Bias with regard to the

properties deﬁned in Section 3.2. Table 1 gives an

overview over the properties. The detailed analyses

follow in Section 4.1 for WEAT and Section 4.2 for

the Direct Bias.

4.1 Analysis of WEAT

In the following, we detail properties of WEAT in

light of the deﬁnitions stated above. First, we focus

on the individual biases as reported by s(t,A,B).

Theorem 1. The bias score function s(t, A,B) of

WEAT is not Magnitude-Comparable.

Proof. With

a =

|A|

∑

a∈A

||a||

and

b analogously de-

ﬁned, we can rewrite

s(t,A, B) =

t ·

||t||

−

t ·

||t||

(14)

||t||



a −



(15)

= cos(t,

a −

b)||

a −

b||. (16)

Hence we can show that the extreme values depend

on the attribute sets A and B:

max

s(t,A, B) = ||

a −

b||, (17)

min

s(t,A, B) = −||

a −

b|| (18)

The statement follows.

Theorem 2. The bias score function s(t, A,B) of

WEAT is Unbiased-Trustworthy.

Table 1: Overview over the properties of bias scores.

bias score comparable trustworthy

WEAT

sample

x ✓

WEAT ✓ x

DirectBias ✓ x

Proof. This follows directly from the deﬁnition of

s(t,A, B) (equation (1)):

s(t,A, B) = s(t, A) − s(t, B) = 0 (19)

⇐⇒ s(t,A) = s(t,B) (20)

Next, we focus on the properties of the effect size

d(X,Y, A,B), identiﬁed by WEAT in Table 1. Note

that it is not speciﬁed for cases, where s(t,A,B) =

s(t

′

,A, B) ∀t, t

′

∈ X ∪Y due to its denominator. This is

highly problematic considering Deﬁnition 3.4, which

states that a bias score should be 0 in that speciﬁc

case. For Theorem 4 we need Lemma 1 from the Ap-

pendix.

Theorem 3. The effect size d(X,Y,A, B) of WEAT is

not Unbiased-Trustworthy.

Proof. For the WEAT score b

= 0. With four

targets t

and s(t

,A, B) = s(t

,A, B) and

s(t

,A, B) = s(t

,A, B) the effect size

d({t

},{t

},A, B) = (21)

(s(t

,A, B) + s(t

,A, B)) − (s(t

,A, B) + s(t

,A, B))

2 · stddev

t∈{t

}

s(t,A, B)

(22)

is 0, if s(t

,A, B) ̸= s(t

,A, B) (otherwise d is not de-

ﬁned). Now, for the simple case A = {a},B = {b}

and assuming all vectors having length 1, we see

s(t

,A, B) = s(t

,A, B)

⇐⇒ a · t

− b · t

= a · t

− b · t

⇐⇒ a · (t

− t

) − b ·(t

− t

) = 0

⇐⇒ (a − b)· (t

− t

) = 0. (23)

This implies that, if the two vectors a − b and t

− t

are orthogonal (and e.g. s(t

,A, B) = 0), the WEAT

score returns 0. In this case, there exist a,b, t

with

s(t

,A, B) = s(t

,A, B) ̸= 0 and accordingly s(t

,A) ̸=

s(t

,B).

Theorem 4. The effect size d(X,Y,A,B) of

WEAT with X = {x

,. .. ,x

},Y = {y

,. .. ,y

}

is Magnitude-Comparable.

Proof. With c

= s(x

,A, B), c

i+m

= s(y

,A, B), n =

2m, ˆµ = 1/n

∑

i=1

and

σ =

1/n

∑

i=1

− µ)

, we

have

d =

1/m

∑

i=1

− 1/m

∑

i=m+1

(24)

∑

i=1

−

∑

i=m+1

∑

i=1

−

∑

i=1

∑

i=1

− 2mˆµ

∑

i=1

− ˆµ

∈ [−2,2] (25)

Semantic Properties of Cosine Based Bias Scores for Word Embeddings

163

where the last statement follows from Lemma 1 (see

Appendix) with

∑

i=1

−ˆµ

∈ [−m,m]. The extreme

value ±2 is reached if c

= ... = c

= −c

m+1

.. . = −c

, which can be obtained by setting x

.. . = x

= −y

= . .. = −y

, independently of A

and B as long as A ̸= B and

∑

∈A

/∥a

∥ ̸= 0 ̸=

∑

∈B

/∥b

∥.

The proof of Theorem 4 shows that the effect size

reaches its extreme values only if all x ∈ X achieve

the same similarity score s(x,A, B) and s(y,A, B) =

−s(x,A, B) ∀ y ∈ Y , i.e. the smaller the variance

of s(x, A,B) and s(y,A,B) the higher the effect size.

This implies that we can inﬂuence the effect size by

changing the variance of s(t,A, B) without changing

whether the groups are separable in the embedding

space. Furthermore, the proof of Theorem 3 shows

that WEAT can report no bias even if the embeddings

contain associations with the bias attributes. This

problem occurs, because WEAT is only sensitive to

the stereotype ”X is associated with A and Y is asso-

ciated with B” and will overlook biases diverting from

this hypothesis.

4.2 Analysis of the Direct Bias

For the Direct Bias, the following theorems show

that it is Magnitude-Comparable, but not Unbiased-

Trustworthy. The proof for Theorem 6 shows that

the ﬁrst principal component used by the Direct Bias

does not necessarily represent individual bias direc-

tions appropriately. This can lead to both over- and

underestimation of bias by the Direct Bias.

Theorem 5. The DirectBias is Magnitude-

Comparable for c ≥ 0.

Proof. For c ≥ 0 the individual bias |cos(t,g)|

is in

[0,1]. Calculating the mean over all targets in T does

not change this bound. The statement follows.

Theorem 6. The DirectBias is not Unbiased-

Trustworthy.

Proof. For the Direct Bias b

= 0 indicates no bias.

Consider a setup with two attribute sets A

}

and C = {c

Using the notation from Section 2.2 this gives us two

deﬁning sets D

= {a

} D

= {a

}. Let a

(−x,rx)

= −c

= (−x,−rx)

= −c

and r > 1.

The bias direction is obtained by computing the ﬁrst

principal component over all (a

− µ

) and (c

− µ

)

with µ

= 0. Due to r > 1, b = (0, 1)

is a

valid solution for the 1st principal component as it

maximizes the variance

b = argmax

∥v∥=1

∑

(v · a

)

+ (v · c

)

. (26)

According to the deﬁnition in Section 3.1, any

word t = (0,w

)

would be considered neutral to

groups A and C with s(t,A) = s(t,C) and being

equidistant to each word pair {a

But with the bias direction b = (0,1)

the Direct Bias

would report b

max

= 1 instead of b

= 0, which con-

tradicts Deﬁnition 3.4.

On the other hand, we would consider a word t =

,0)

maximally biased, but the Direct Bias would

report no bias. Showing that the bias reported for sin-

gle words t is not Unbiased-Trustworthy, proves that

the DirectBias is not Unbiased-Trustworthy.

5 EXPERIMENTS

In the experiments, we show that the limitations of

WEAT and Direct Bias shown in Section 4 do oc-

cur with state-of-the-art language models. We show

that the effect size of WEAT can be misleading when

comparing bias in different settings. Furthermore,

we highlight how attribute embeddings differ between

different models, which impacts WEAT’s individ-

ual bias, and that the Direct Bias can obtain a mis-

leading bias direction by using the Principal Compo-

nent Analysis (PCA). We use different pretrained lan-

guage models from Huggingface (Wolf et al., 2019)

and the PCA implementation from Scikit-learn (Pe-

dregosa et al., 2011) and observe gender bias based

on 25 attributes per gender, such as (man,woman).

5.1 Weat’s Effect Size

In a ﬁrst experiment, we demonstrate that the effect

size does not quantify social bias in terms of the sep-

arability of stereotypical targets. We use embeddings

of distilbert-base-uncased and openai-gpt to compute

gender bias according to s(t,A,B) and the effect size

d(X,Y, A,B) for stereotypically male/female job ti-

tles. Figure 1 shows the distribution of s(t,A,B) for

DistilBERT, where stereotypical male/female targets

are clearly distinct based on the sample bias. Figure

2 shows the distribution of s(t, A,B) for GPT, where

stereotypical male and female terms are similarly dis-

tributed. First, we focus on the DistilBERT model

(Figure 1), which clearly is biased with regard to the

tested words. We compare two cases with different

targets, such that the stereotypical target groups are

better separable in one case (left plot), which one may

describe as more severe or more obvious bias com-

pared to the second case, where the target groups are

almost separable (right plot). However, the effect size

behaves contrary to this.

Despite this, when comparing Figures 1 and 2 ,

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

164

Figure 1: WEAT individual bias and effect size for distilBERT with different selections of target words. When selecting a

smaller number of job titles (left), we observe that stereotypical male/female jobs are more distinct w.r.t. s(t,A,B), while the

effect size is lower.

one can assume that large differences in effect sizes

still reveal signiﬁcant differences in social bias. While

high effect sizes are reported in cases where both

stereotypical groups are (almost) separable, we report

low effect sizes when groups achieve similar individ-

ual biases. Furthermore, one should always report the

p-value jointly with the effect size to get an impres-

sion on its signiﬁcance. In other terms, WEAT is use-

ful for qualitative bias analysis (or conﬁrming biases),

but not quantitatively.

5.2 Weat’s Individual Bias

In Theorem 1 we discussed that WEAT’s individ-

ual bias depends on the mean difference of attributes

a −

b||. As shown in Table 2 these vary a lot be-

tween different language models. We report the two

most extreme values of 0.198 for distilroberta-base

and 5.206 for xlnet-base-uncased. With such dif-

ferences we cannot compare sample biases based on

their magnitude between different models.

5.3 Direct Bias

Figure 3 shows the correlation of different bias di-

rections on bert-base-uncased embeddings. We re-

port bias directions of individual word pairs such as

(man,woman) (left plot: 0-24, right plot: 0-22) and

the resulting bias direction as obtained by PCA (last

row). Overall we report very low correlations be-

Figure 2: WEAT’s individual bias for job titles in GPT.

tween the individual bias directions. The ﬁrst prin-

cipal component reﬂects mostly individual bias direc-

tion 23 and 24 (left plot), which differ a lot from all

other bias directions. On contrary, if we excluded

word pairs 23 and 24 from the PCA (right plot), the

ﬁrst principal component would give a better estimate

of bias directions 0-22. This shows that only one or

few ”outlier” pairs are sufﬁcient to make the Direct

Bias measure ”bias” in a completely different way.

From a practical point of view, by analyzing the corre-

lation between individual bias directions we can get a

good estimate whether the ﬁrst principal component is

a good estimate. Moreover, if we observe only weak

correlations between bias directions from the selected

word pairs, that is an indication that a 1-dimensional

bias direction may not be sufﬁcient to capture the re-

lationship of sensitive groups with regard to which we

want to measure bias. While Bolukbasi et al. (Boluk-

basi et al., 2016) did not explicitly deﬁne that case for

the Direct Bias, they proposed to use a bias subspace,

deﬁned by the k ﬁrst principal components, for their

Debiasing algorithm, which is related to the Direct

Bias. Accordingly, this could be applied to the Direct

Bias. Apart from that, one should verify how well

the bias direction or bias subspace obtained by PCA

Table 2: Mean attribute difference ||

a−

b|| for different lan-

guage models given 25 attribute pairs for gender.

Model Name Mean Attribute Diff

openai-gpt 0.728

gpt2 0.842

bert-large-uncased 1.123

bert-base-uncased 0.568

distilbert-base-uncased 0.433

roberta-base 0.235

distilroberta-base 0.198

electra-base-generator 0.518

albert-base-v2 1.123

xlnet-base-cased 5.206

Semantic Properties of Cosine Based Bias Scores for Word Embeddings

165

Figure 3: Correlation bias directions of individual word pairs (left: 0-24, right: 0-22) and the ﬁrst principal component (last

row) as selected for the Direct Bias. The lowest row in the heatmap shows the correlation of individual bias directions with

the ﬁrst principal component.

represents individual bias directions to make sure that

they’re actually measuring bias in the assumed way.

6 CONCLUSION

In this work, we introduce formal properties for co-

sine based bias scores, concerning their meaningful-

ness for quantiﬁcation of social bias. We show that

WEAT and the Direct Bias have theoretical ﬂaws that

limits their ability to quantify bias. Furthermore, we

show that these issues have a real impact when ap-

plying these bias scores on state-of-the-art language

models. These ﬁndings should be considered in the

experimental design when evaluating social bias with

one of these measures. Future works could build on

the proposed properties to analyze other scores from

the literature or propose a score that is better suited

for bias quantiﬁcation. The ﬁndings of our theoretical

analysis open the question, whether the limitations of

cosine based scores reported in the literature are due

to the theoretical ﬂaws of distinct scores, which are

highlighted by our analysis, rather than limitations of

geometrical properties as a sign of bias. This is an

important question that should be addressed in future

work. In general, we encourage other researchers to

take an effort to bring the various bias measures from

the literature into context and to highlight their prop-

erties and limitations, which is critical to derive best

practices for bias detection and quantiﬁcation.

ACKNOWLEDGEMENTS

Funded by the Ministry of Culture and Science of

North-Rhine-Westphalia in the frame of the project

SAIL, NW21-059A.

REFERENCES

Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., and

Kalai, A. T. (2016). Man is to computer programmer

as woman is to homemaker? debiasing word embed-

dings. Advances in neural information processing sys-

tems, 29:4349–4357.

Caliskan, A., Bryson, J. J., and Narayanan, A. (2017). Se-

mantics derived automatically from language corpora

contain human-like biases. Science, 356(6334):183–

186.

Delobelle, P., Tokpo, E. K., Calders, T., and Berendt, B.

(2021). Measuring fairness with biased rulers: A sur-

vey on quantifying biases in pretrained language mod-

els. arXiv preprint arXiv:2112.07447.

Du, Y., Fang, Q., and Nguyen, D. (2021). Assessing the

reliability of word embedding gender bias measures.

In Proceedings of the 2021 Conference on Empiri-

cal Methods in Natural Language Processing, pages

10012–10034, Online and Punta Cana, Dominican

Republic. Association for Computational Linguistics.

Ethayarajh, K., Duvenaud, D., and Hirst, G. (2019). Under-

standing undesirable word embedding associations.

arXiv preprint arXiv:1908.06361.

Goldfarb-Tarrant, S., Marchant, R., S

anchez, R. M.,

Pandya, M., and Lopez, A. (2020). Intrinsic bias

metrics do not correlate with application bias. arXiv

preprint arXiv:2012.15859.

Gonen, H. and Goldberg, Y. (2019). Lipstick on a pig: De-

biasing methods cover up systematic gender biases in

word embeddings but do not remove them. CoRR,

abs/1903.03862.

Kaneko, M., Bollegala, D., and Okazaki, N. (2022). De-

biasing isn’t enough!–on the effectiveness of debias-

ing mlms and their social biases in downstream tasks.

arXiv preprint arXiv:2210.02938.

Kurita, K., Vyas, N., Pareek, A., Black, A. W., and

Tsvetkov, Y. (2019). Measuring bias in con-

textualized word representations. arXiv preprint

arXiv:1906.07337.

May, C., Wang, A., Bordia, S., Bowman, S. R., and

Rudinger, R. (2019). On measuring social biases in

sentence encoders. CoRR, abs/1903.10561.

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

166

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,

Thirion, B., Grisel, O., Blondel, M., Prettenhofer,

P., Weiss, R., Dubourg, V., et al. (2011). Scikit-

learn: Machine learning in python. Journal of ma-

chine learning research, 12(Oct):2825–2830.

Schr

oder, S., Schulz, A., Kenneweg, P., and Hammer, B.

(2023). So can we use intrinsic bias measures or not?

In Proceedings of the 12th International Conference

on Pattern Recognition Applications and Methods.

Seshadri, P., Pezeshkpour, P., and Singh, S. (2022). Quanti-

fying social biases using templates is unreliable. arXiv

preprint arXiv:2210.04337.

Steed, R., Panda, S., Kobren, A., and Wick, M. (2022). Up-

stream mitigation is not all you need: Testing the bias

transfer hypothesis in pre-trained language models. In

Proceedings of the 60th Annual Meeting of the Associ-

ation for Computational Linguistics (Volume 1: Long

Papers), pages 3524–3542.

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C.,

Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz,

M., and Brew, J. (2019). Huggingface’s transformers:

State-of-the-art natural language processing. CoRR,

abs/1910.03771.

APPENDIX

In order to show that the effect size of WEAT is

Magnitude-Comparable (see Theorem 4), we need

the following lemma.

Lemma 1. Let x

,..., x

∈ R be real numbers. Let

ˆµ,

σ denote the empirical estimate of mean and stan-

dard deviation of the x

. Then, for any selection of

indices i

,..., i

, with i

̸= i

for j ̸= k, the following

bound holds



∑

j=1

− ˆµ



≤

m · (n −m).

Furthermore, for 0 < m < n the bound is obtained

if and only if all selected resp. non-selected x

have

the same value, i.e. x

= ˆµ + s

n−m

σ and all other

= ˆµ −s

n−m

σ with s ∈ {−1,1}.

Proof. For cases m = 0 or m = n the statement is triv-

ial. So assume 0 < m < n. Let f (x) = ax + b

be an afﬁne function. Then the images of x

under f

have mean aˆµ +b and standard deviation |a|

σ. On the

other hand, we have

f (x

) − (aˆµ + b)

|a|

(ax

+ b) − (aˆµ +b)

|a|

= sgn(a)

− ˆµ

Thus, applying f does not change the bound and

therefore we may reduce to case of ˆµ = 0 and

σ = 1.

This allows us to rephrase the problem of ﬁnding the

maximal bound as an quadratic optimization problem:

min s

⊤

s.t. x

⊤

x = n

⊤

x = 0,

where s = (1,...,1,0,...,0)

⊤

, x = (x

,..., x

)

⊤

and 1

denotes the vector consisting of ones only. Notice,

that we assumed w.l.o.g. that i

,..., i

= 1,...,m. Fur-

thermore, we made use of the symmetry properties to

replace max|s

⊤

x| by the minimizing statement above,

ˆµ = 0 is expressed by the last and

σ = 1 by the ﬁrst

constrained (recall that

σ =

1/nx

⊤

x − ˆµ

). Notice,

that ∇

⊤

x − n = 2x and ∇

⊤

x = 1 are linear de-

pendent if and only if x = a1 for some a ∈ R , thus, as

0 = a1

⊤

1 = an if and only if a = 0 and (01)

⊤

(01) = 0,

there is no feasible x for which the KKT-conditions do

not hold and we may therefore use them to determine

all the optimal points.

The Lagrangien of the problem above and its ﬁrst

two derivatives are given by

L(x,λ

,λ

) = s

⊤

x − λ

⊤

Ix − n)− λ

⊤

∇

L(x,λ

,λ

) = s −2λ

x − λ

∇

x,x

L(x,λ

,λ

) = −2λ

We can write ∇

L(x,λ

,λ

) = 0 as the following lin-

ear equation system:





























|{z}

Subtracting the ﬁrst row from row 2,...,m and row

m+1 from row m+2,...,n we see that 2(x

−x

)λ

0 for k = 1,...,m and 2(x

− x

m+1

)λ

= 0 for k =

m + 2, ...,n, which either implies λ

= 0 or x

= x

... = x

and x

m+1

= x

m+2

= ... = x

. However, as-

suming λ

= 0 would imply that λ

= 1 from the

ﬁrst row and λ

= 0 from the m + 1th row, which is

a contradiction. Thus, we have x

= x

= ... = x

and x

m+1

= x

m+2

= ... = x

. But the second con-

straint from the optimization problem can then only

be fulﬁlled if mx

+ (n − m)x

m+1

= 0 and this implies

m+1

= −

n−m

. In this case the ﬁrst constraint is

equal to n = mx

+ (n − m)



n−m



, which has the

solution x

= ±

n−m

Set x

∗

= (−

n−m

,..., −

n−m

,...,

n−m

Then x

∗

and −x

∗

are the only possible KKT points

Semantic Properties of Cosine Based Bias Scores for Word Embeddings

167

as we have just seen. Plugging x

∗

into the equation

system above and solving for λ

1/2

we obtain

∗

= −



n−m



, λ

∗

n−m

Now, as ∇

x,x

L(x

∗

,λ

∗

,λ

∗

) =



n−m



−1

I is

positive deﬁnite, we see that x

∗

is a global optimum,

indeed. The statement follows.

ICPRAM 2024 - 13th International Conference on Pattern Recognition Applications and Methods

168