CTypiClust: Conﬁdence-Aware Typical Clustering for Budget-Agnostic

Active Learning with Conﬁdence Calibration

Takuya Okano

, Yohei Minekawa

and Miki Hayakawa

Hitachi High-Tech Corporation, Japan

Keywords:

Deep Learning, Active Learning, Conﬁdence Calibration.

Abstract:

Active Learning (AL) has been widely studied to reduce annotation costs in deep learning. In AL, the ap-

propriate method varies depending on the number of annotatable data (budget). In low-budget settings, it is

appropriate to priorit ize sampling typical data, while in high-budget settings, it is better to prioritize sampling

data with high uncertainty. This study proposes Conﬁdence-aware Typical Clustering (CTypiClust), an AL

method t hat performs well regardless of the budget. CTypiClust dynamically swit ches between typical data

sampling and low-conﬁdence data sampling based on conﬁdence. Additionally, to mitigate t he overconﬁdence

problem in low-budget settings, we propose a new conﬁdence calibration method Cluster-Enhanced Conﬁ-

dence (CEC). By applying CEC to CTypiClust, we suppress the occurrence of overconﬁdence in low- budget

settings. To evaluate the effectiveness of the proposed method, we conducted experiments using multiple

benchmark datasets, and conﬁrmed that C TypiClust consistently shows high performance regardless of the

budget.

1 INTRODUCTION

Reducing annotation costs is on e of the critical chal-

lenges in deep learning. To enhanc e the performance

of deep learning models, a large amount of data is re -

quired, but annotating all the data is very costly. This

problem is particularly severe in ﬁelds requiring ex-

pertise, such as manufacturing and healthcare, where

accurate labeling demands enormous costs and time.

Active Learning (AL) h a s been widely studied as

a method to minimize annotatio n costs (Ren et al.,

2021). In AL, a ﬁxed budget of data is sampled from

a large pool of un la beled data, based on its useful-

ness for improving model performance. This process

is repeated to optimize the model with minimal la-

beled data. Traditional methods include AL methods

using un certainty (low conﬁdence) (Roth a nd Small,

2006; Gal et al., 2017; Pop and Fulop, 2018), meth-

ods considering data diversity (Sener and Savarese,

2018; Zhdanov, 2019), and methods considering both

uncertainty and diversity (Sinh a et al., 2019). T hese

methods assum ed relatively high-budget settings, but

recent advances in self-supervised learning (Jaiswal

https://orcid.org/0009-0006-3749-3883

https://orcid.org/0009-0006-8980-4356

https://orcid.org/0009-0004-3547-8896

et al., 2021) have led to the d evelopment of AL meth-

ods in cold -start settings with no labeled data (Chen

et al., 2023; Yi et al., 2022), and research on AL in

low-budget settings (Hacohen et al., 2022).

However, (Hacohen et al., 2022) shows that the

optimal method varies depending on th e budget. In

low-budget settings, it is necessary to learn from lim-

ited data, so prioritizing typical data makes it easier to

capture the overall characteristics of the dataset, lead-

ing to faster model accuracy improvement. On the

other hand, in high-budget settings, the main features

of the dataset can be lear ned from a large amount of

data, so learn ing data with high un certainty near the

decision boundary is effective fo r improving accuracy

(Hacohen et al., 2022).

Therefore, in this study, we propose Conﬁdence-

aware Typical Clustering (CTypiClust), which per-

forms highly regardless of the budget. This method

extends Typical Clustering(TypiClust)(Haco hen

et al., 2022), a method for low-budget settin gs, by

dynamically switching between sampling typical

data and low-conﬁdence data based on conﬁdence,

making it effective in high-budget settings as well.

Additionally, to address the issue of overconﬁ-

dence in low-budget settings, we propose a new

conﬁdence calibration method Cluster-Enhanced

Conﬁdence(CEC) and apply it to CTypiClust. To

340

Okano, T., Minekawa, Y. and Hayakawa, M.

CTypiClust: Conﬁdence-Aware Typical Clustering for Budget-Agnostic Active Learning with Conﬁdence Calibration.

DOI: 10.5220/0013139400003912

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 2: VISAPP, pages

340-347

ISBN: 978-989-758-728-3; ISSN: 2184-4321

conﬁrm that CTypiClust performs well regardless

of the budget, we evaluate its effectiveness using

CIFAR-10, CIFAR-100, and STL-10.

The contributions of this paper are as follows:

1. We propose an AL method CTypiClust that con-

sistently de monstrates high performance regard-

less of budget constraints.

2. A conﬁdence calibration metho d CEC is pro-

posed, which effectively mitigates overconﬁdence

eve n in low-budget setting s.

3. The e ffectiveness of CTypiClust and CEC is

demonstra te d through experiments using multiple

benchm ark datasets.

2 RELATED WORK

2.1 Active Learning

High-Budget Active Learning. Many AL methods

typically assume a high-budget setting, where meth-

ods that sample data with high unce rtainty (Roth and

Small, 2006; Gal et al., 2017; Pop and Fulop, 2018) ,

methods that consider diversity (Sener and Savarese,

2018; Zhda nov, 2019), and methods that consider

both unc ertainty and dive rsity have been proposed

(Kirsch et al., 2019; Sinha et al., 2019).

Among these, methods that sample data with high

uncertainty have been widely proposed (Roth and

Small, 2006; Gal et al., 2017; Pop and Fulop, 2018) .

(Roth a nd Small, 2006) pro posed Margin, which pri-

oritizes sampling data with a small difference b e-

tween the highest and second-highest predic ted prob-

abilities, considering such data as having high uncer-

tainty. (Gal et al., 2017) proposed DBAL, which uti-

lizes a Bayesian approach.

(Sener and Savarese, 2018) proposed Co reSet,

an AL method that prioritizes diversity. CoreSet

achieves diversity-aware sampling by sampling rep-

resentative da ta based on the core-set app roach. (Zh-

danov, 2019) proposed a mini-batch active learn-

ing method that incorporates data diversity using K-

means clustering to enhance the efﬁciency of label se-

lection in large-scale datasets.

Approac hes that consider both uncertainty and di-

versity have also been proposed (Kirsch et al., 2019;

Sinha et al., 2019). (Kirsch et al., 2019) proposed

BALD, which balances uncertainty a nd diversity by

sampling to maximize mutual informatio n among

data points within a b a tc h, in add ition to a Bayesian

approa c h. (Sinha et al., 2019) introduced VAAL, a

method that focuses on both diversity and unc e rtainty

using a variational autoencoder.

Cold and Low-Budget Active Learning. In settings

like cold-start and low-budget, where there is little or

no labeled data, methods effective in high-budget set-

tings perform worse than random sampling (Hacohen

et al., 2022). As an effective method in cold-start set-

tings, (Yi et al. , 2022) introduce d a method that sam-

ples data w ith high loss in pretext tasks, considering

it to have high learning efﬁciency. (Chen et al., 2023)

proposed a method based on contrastive learning that

samples data that is difﬁcult to distinguish as typical

data. In cold-start and low-budget settings, (Hacohen

et al., 2022) proposed TypiClust. TypiClust priori-

tizes samplin g data with hig h density in the feature

space of unlabeled data as typical data.

Typicality-prioritized AL meth ods like TypiClust

perform poorly in high-budget settings. Therefore , in

this study, we propose CTypiClust, which p e rforms

highly regardless of the budget by combining meth-

ods based on uncer ta inty. Uncertainty is generally

calculated from the model’s conﬁden c e, but in low-

budget settings, there is a problem of overconﬁdence ,

where the model becomes excessively conﬁdent.

2.2 Conﬁdence Calibration

Deep learning models are known to exhibit overcon-

ﬁdence, wh e re the predicted conﬁdence signiﬁcantly

exceeds the a ctual accuracy. This issue is particu-

larly prevalent in low-budget settings with very lim-

ited training data. To address this problem, numer-

ous calibration methods have been proposed to align

predicted probabilities with actu a l accuracies (Wang,

2024).

Calibration methods can be broadly categorized

into post-hoc methods and regularization methods.

Post-hoc methods perform calibration using a large

amount of data after the model has been trained. For

example, temperature scaling (Platt, 2000; Mozafari

et al., 2019) op timizes the temperature parameter of

the softmax function using validation data. On the

other hand, regularization methods add penalties to

the model’s loss function to suppress overconﬁde nce

(Guo et al., 2017; Pereyra et al., 2017).

Many calibration methods assume settings with a

large a mount of labeled data or aim to improve the

model itself o r the loss function. However, they do

not consider special settin gs like low-budget, where

labeled data is extremely limited.

In this study, we propose a new calibration method

CEC to mitigate overconﬁdence in low-budget set-

tings and apply it to CTypiClust. CEC c orrects the

model’s con ﬁdence based on the clustering results of

intermediate layer features.

CTypiClust: Conﬁdence-Aware Typical Clustering for Budget-Agnostic Active Learning with Conﬁdence Calibration

341

Figure 1: Illustration of CTypiClust and CEC. (a) CTypiClust, similarly to TypiClust, obtains features fr om the unlabeled

dataset U and performs clustering. From each cl uster K

, it r et r ieves the data with the highest typicality x

and the data with

the lowest CEC x

. It decides whether to sample x

or x

based on the CEC(x

) of the data with the highest typicality. (b)

In CEC, the input data x is fed into the model to obtain the features f (x) (black circle) and the output of the classiﬁcation

model g(y | f (x)). From g(y | f (x)), the pseudo-label ˆy and the conﬁdence ˆc are calculated. The features of the unlabeled

data are clustered, and the cluster to which x belongs is assigned the pseudo-label ˜y. The conﬁdence ˜c is calculated based on

the relative distance to the center µ

(star) of each cluster. Finally, CEC(x) is calculated from the two pseudo-labels and the

conﬁdence.

3 METHOD

In this section, we introduce Conﬁden ce-aware Typi-

cal Clustering (CTypiClust). The detailed methodol-

ogy of CTypiClust is explained in Section 3.2. Ad-

ditionally, we propose a new conﬁdence calibration

method called Cluster-Enhance d Conﬁdence (CEC),

which is used in CTypiClust and is discussed in Sec-

tion 3.3. The methods are illustrated in Figure 1.

3.1 Notation

Let X be the set of all input data, and each data point

x ∈ X is included in the unlabeled dataset U ⊆ X.

Although the data in the unlabeled dataset U are

not labeled, there exists a set of class labels Y =

{1, 2, . . . , |Y |} corresponding to the data. Each data

point x has a corr esponding label y ∈ Y . The model

used in this study is divided into a feature extractor

f (·) and a classiﬁer g(·). First, the feature extractor

f extracts features f (x) from the input x. The classi-

ﬁer g takes the featu res f (x) as input and ou tputs the

probability distribution g(y | f (x)) for the label y.

3.2 Conﬁdence-Aware Typical

Clustering

We propose CTypiClust, an extension of TypiClust

(Hacohen et al., 2022) that considers conﬁdence .

While TypiClust samples data with hig h typicality

as is, CTypiClust determines whether to sample data

with high typicality or low -conﬁdence data based on

the conﬁdence of the data with high typicality. If

the conﬁdence of the data with h igh typicality is

high, CTypiClust assumes that the learning efﬁciency

of typical data is low and samples low-conﬁdence

data. As a result, in immature stages like low-budge t

settings, typicality-prio ritized sampling is expected,

while in mature stages like high-budget settin gs, low-

conﬁdence-prior itized sampling is expected. Addi-

tionally, CTypiClust uses CEC as the c onﬁdence mea-

sure to mitigate overconﬁdence in low-budget set-

tings.

The speciﬁc steps of CTypiClust are explained be-

low. CTyp iClust consists of four steps. Steps 1 and 2

are the same as in TypiClust.

Step1: Pre-train the model f using the unlabeled data

U with Self-Supervised Learnin g methods (e.g., Sim-

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

342

CLR (Chen et al., 2020)).

Step2: Input the unlabe le d data U into the model f

pre

trained in Step 1 to obtain features. Perform cluster-

ing on the obtained features using a me thod such as

K-means.

Step3: Extract data with high typicality x

argmax

x∈K

{Typicality(x)} and data with low CEC

= arg min

x∈K

{CEC(x)} from each cluster K

. The

calculation method of CEC is explained in Sectio n

3.3. Typicality(x) is calculated as the local density

in the feature spa ce of x, as in (Hacohen et al., 2022).

Speciﬁcally, it is deﬁned by the following equation:

Typicality(x) =

∑

∈K -NN(x)

kx − x

−1

Here, K is the number of data points in the K -nearest

neighbors (K - NN ) of x, and x

is one of the data

points in th e neighborho od. kx−x

is the Euclidean

distance between the data point x and its neighboring

point x

Step4: If the CEC(x

) of the data with high typical-

ity x

is below th e threshold T

, sample x

as is. If

CEC(x

) is h igher than the threshold T

, sample the

data x

with the lowest CEC in the same cluster.

The algorithm of CTypiClust is shown in Algo-

rithm 1.

Data: Unlabele d pool U, Budget B

Result: Queries

Embedding ← Representation

Learning(U);

Clust ← Clustering

algorithm (Embedding,

B);

Queries ←

for i = 1 to B do

← argmax

x∈K

{Typicality(x)} ;

← argmin

x∈K

{CEC(x)} ;

if CEC(x

) > T

then

Add x

to Queries ;

else

Add x

to Queries ;

end

Algorithm 1: CTypiClust.

3.3 Cluster-Enhanced Conﬁdence

In low-budget settings, where the amount of train-

ing data is limited, the model tends to overﬁt and be-

come overconﬁdent. To mitigate overconﬁdence, we

propose a conﬁdence ca libration method CEC, which

corrects the conﬁdence of the cla ssiﬁcation model’s

output using the c lustering results of intermediate

layer features.

The pseud o-label obtained from the classiﬁer g

is ˆy = arg max

g(y | f (x)), and the conﬁdence is

ˆc = max

g(y | f (x)). Additionally, the features of

the unlab eled d ata U are clustered, and the center of

each cluster K

is µ

∑

x∈K

f (x). The pseu do-

label ˜y is the label obtained from clustering. In K-

means, ˜y = argmin

D(x, µ

). D represents any dis-

tance function (e.g., Euclidean distance, Cosine sim-

ilarity). The correspondenc e between the model’s

pseudo- la bel ˆy and the clu ster ing pseudo-label ˜y is

based on the frequency of label occurr ence within

each cluster. Speciﬁcally, after each data x is assigned

a pseudo-label ˜y by the clustering method, the model’s

pseudo- la bel ˆy that appears most frequently within

each c luster K

is assign ed as the representative label

of that cluster. The conﬁdence ˜c is calculated based

on the relative distance to the center of each cluster,

similar to Prototypical Networks (Snell et al., 2017).

˜c = max

exp(−D( f (x), µ

))

∑

exp(−D( f (x), µ

))

By using these values, CEC(x) is deﬁned a s follows:

CEC(x) = 1

[ ˆy= ˜y]

· ˆc · ˜c. (1)

Here, 1

[ ˆy= ˜y]

is an indicator function that returns 1 if

the two pseu do-labels match and 0 if they do not. The

number of clusters |K| is set to be equa l to th e number

of classes |Y |. This function ensures that if the labels

do no t match, CEC becomes 0, and unless both conﬁ-

dences are high, the conﬁdence will not be high. The

algorithm of CEC is shown in Algorithm 2 .

Data: Data x, Clust K, Models f , g

Result: CEC(x)

for i = 1 to |K| do

←

∑

x∈K

f (x) ;

end

ˆy ← argmax

g(y | f (x));

˜y ← argmin

D(x, µ

) ;

ˆc ← max

g(y | f (x));

˜c ← max

exp(−D(f (x),µ

))

∑

exp(−D(f (x),µ

))

;

CEC(x) ← 1

[ ˆy= ˜y]

· ˆc · ˜c ;

Algorithm 2: CEC.

4 EXPERIMENT AND

DISCUSSION

To verify whether CTypiClust performs superiorly re-

gardless of th e budget, we use multiple datasets and

compare it with related methods under various budget

settings. Additionally, we conduct an ablation stu dy

of CTypiClust and CEC using the CIFA R-10 dataset.

CTypiClust: Conﬁdence-Aware Typical Clustering for Budget-Agnostic Active Learning with Conﬁdence Calibration

343

(a) CIFAR-10

(b) CIFAR-100

Figure 2: Comparison of the ACC difference between each method and Random for each dataset. Results are shown from left

to right for low, medium, and high budgets. The shaded area reﬂects standard error.

4.1 Experimental Settings

In this experiment, we evaluate based on the AL pro-

gram proposed by (Munjal et al., 2022). The datasets

used for evaluation ar e CIFAR-10 (Krizhevsky and

Hinton, 2009), CIFAR-100 (Krizhevsky and Hin-

ton, 2009), and STL-10 (Coates et al., 2011). Th e

compariso n meth ods are TypiClust(Hacohen e t al.,

2022), Margin(Ro th and Small, 2006), DBAL(Gal

et al., 2017), BALD(K irsch et al., 2019), and Ran-

dom. We set three types of budgets (low, medium,

and hig h) and conﬁgure them for each da ta set as fol-

lows: low=10, medium =100, high=1000 for CIFAR-

10; low=100, medium=1000, high=3000 for CIFAR-

100; and low=10, medium=100, high=500 for STL -

10. We use ResNet-18 (He et al., 2016) as the model.

For TypiClust and CTypiClust, we use feature s ex-

tracted from models pre-trained with SimCLR (Chen

et al., 2020). The models for learning each dataset ar e

also pr e-trained with SimCLR. Th e parameter T

for

CTypiClust is set to 0.8, and the distance function D

is th e Euclidean distanc e. The evaluation metrics are

accuracy (ACC) and Area Under the Budge t Curve

(AUBC) (Z han et al., 2021). AUBC is a metric that

calculates the area under the ACC curve for each bud-

get. Other detailed settings are described in the Ap-

pendix.

4.2 Performance Comparison of

Different Methods

To evaluate whether CTypiClust performs highly re -

gardless of the budget compared to other methods, we

compare it with related methods under various budget

settings for m ultiple datasets.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

344

Table 1: Mean and standard deviation of AUBC for each method under low, medium, and high budgets. The numbers in

parentheses indicate the budget size. The highest performance is shown in red, and the second highest in blue.

CIFAR-10 CIFAR-100 STL-10

Budget Low(10) Medium(100) High(1000) Low(100) Medium(1000) High(3000) Low(10) Medium(100) High(500)

Random 49.77 (±1.67) 74.20 (±0.70) 82.56 (±0.12) 20.90 (±0.44) 44. 83 (±0.13) 53.94 (±0.20) 41.88 (±1.98) 71.00 (±0.8 5) 83.67 (±0.42)

Margin 46.78 (±7.11) 74.89 (±0.05) 83.52 (±0.20) 20.37 (±0.16) 43. 48 (±0.13) 54.33 (±0.18) 42.39 (±1.83) 72.21 (±1.6 6) 85.14 (±0.08)

DBAL 33.58 (±6.65) 68.75 (±1.97) 82.10 (±0.07) 13.58 (±1.38) 37. 84 (±0.43) 51.05 (±0.26) 30.89 (±1.06) 62.30 (±1.8 2) 83.85 (±0.35)

BALD 29.32 (±2.86) 56.03 (±1.46) 80.34 (±0.27) 15.28 (±0.43) 38. 48 (±0.16) 50.16 (±0.35) 33.24 (±2.45) 55.38 (±1.7 3) 82.17 (±0.14)

TypiClust 60.84 (±1.28) 75.66 (±0.26) 82.47 (±0.18) 28.77 (±0.15) 44. 53 (±0.11) 52.48 (±0.19) 56.00 (±1.26) 73.83 (±0.4 9) 83.72 (±0.24)

CTypiClust 61.13 (±1.70) 76.29 (±0.22) 83.3 9 (±0.12) 29.04 (±0.24) 45. 63 (±0.14) 54.03 (±0.42) 55.97 (±0.59) 74.07 (±0.2 8) 84.84 (±0.17)

Figure 2 compare s each method with Random un -

der different budgets for each dataset. TypiClust per-

forms well when the budget is small, such as in low-

budget settings, but it performs worse than Random

as the budget increases. On the other hand, Margin,

which samples data with high u ncertainty, performs

better than Random in high-budget settings but worse

than Random in low-budget settings. The proposed

method CTypiClust per forms better than Random in

most cases regardless of the budget.

Table 1 compares the results of each method

in term s of AUBC. Sim ilar to Figure 2, TypiClust

performs well in low-budget settings but poorly in

high-budget settings. Margin performs well in high-

budget settings but relatively poorly in low-budget

and medium-budget settings. CTyp iClust ranks ﬁrst

or second in all budget settings, demonstrating high

performance regardless of the budget.

4.3 Ablation Study

In this section, we conduc t an ablation study on CTyp -

iClust and CEC using the CIFAR-10 dataset. First, we

compare CTypiClust using conﬁdence ˆc and CEC to

evaluate the necessity of CEC in CTypiClust . Next,

we assess whether CEC mitigates the issue of over-

conﬁdence. Finally, we examine the performance dif-

ferences based o n the parameter T

in CTypiClust.

Comparison of CTypiClust Using Conﬁdence ˆc

and CEC. To evaluate the necessity of CEC in CTyp-

iClust, we compare the perfor mance of CTypiClust

using simple c onﬁdence ˆc (w/o CEC) and CTypiClust

using CEC (w/ CEC). Table 2 shows the AUBC of

w/o CEC a nd w/ CEC. In low-budget settings, the

AUBC of w/o CEC is 59.34%, that of w/ CEC is

61.13%, approximately 1.79% imp rovement by using

CEC. In high-budget settings, the AUBC of w/o CEC

is 83.43%, slightly better than the 83.39% of w/ CEC,

but the difference of 0 .04% is very small, with almost

no difference between them. This is likely because

in high-budget settin gs, the model’s ACC improves,

and overconﬁdence is mitigated, allowing w/o CEC

to perform well. Thus, in scenarios with relatively

low overconﬁdence like high-budget settings, the ne-

cessity of CEC is low, but in scenarios prone to over-

conﬁdence like low-budget settings, the n ecessity of

CEC is high.

Table 2: Comparison of AUBC for CTypiClust without

CEC (w/o CEC) and with CEC (w/ CEC) . The numbers in

parentheses indicate the budget size. The highest value for

each budget is shown in bold.

Budget Low (10) Medium(100) High(1000)

w/o CEC 59.34 75.81 83.43

w/ CEC

61.13 76.29 83.39

Veriﬁcat ion of Overconﬁdence Mitigation by CEC.

To verify the extent to which CEC mitigates over-

conﬁdence, we compare the overco nﬁdence of simple

conﬁdence ˆc and CEC. We use Expected Calib ration

Error (ECE) (Pakdaman Naeini et al. , 2015) to quanti-

tatively evaluate overco nﬁdence. ECE ranges from 0

to 1, with lower ECE indicating less overconﬁdence.

Figure 3 compares ECE for each budget between con-

ﬁdence ˆc and CEC on the CIFAR-10 test data.

Figure 3: Comparison of ECE between conﬁdence ˆc and

CEC.

Figure 3 shows that CEC has smaller ECE than

ˆc f or all budgets. This indicates that CEC re duces

overconﬁdence compared to ˆc. The difference in ECE

between CEC and ˆc is particularly large in settings

with small budgets of 10 to 100.

The reason CEC mitigates overconﬁdence is

likely due to the use of the agreement between the

pseudo- la bel ˆy obtained from the classiﬁer g and the

pseudo- la bel ˜y obtained by clustering the features.

When the budget is low and learning is insufﬁcient

(low ACC), features are not well-separated by class,

leading to many mismatched pseudo-lab e ls and CEC

CTypiClust: Conﬁdence-Aware Typical Clustering for Budget-Agnostic Active Learning with Conﬁdence Calibration

345

values of 0 (fro m Equation 1). As the budget and

ACC increase, features separate better, pseudo-labe ls

match more, a nd CEC values rise.

To verify this, we visualized the re lationship be-

tween ACC, test data feature space, and the two

pseudo- la bels. Figu re 4 shows that with low ACC (as

shown on the left side of Figure 4), features are poorly

separated and pseudo-labels often mismatch, result-

ing in ma ny CEC values of 0. With higher budgets

and ACC (as shown on the right side of Figure 4), fea-

tures sepa rate better, pseudo-labels match mo re, and

CEC values increase. Conﬁdence ˜c also rises as clu s-

ters become more distinct.

Figure 4: Visualization of CIFAR-10 test data features using

t-SNE (Laurens and Hinton, 2008) and labeling the features

with ˆy and ˜y. ˆc

mean

represents the mean of ˆc for the test

data, and CEC

mean

represents the mean of CEC for the test

data. The left and right halves show the feature space when

ACC is low and high, respectively. The top row represents

the unlabeled feature space, the middle row represents the

feature space labeled by ˆy, and the bottom row represents

the feature space labeled by ˜y.

Comparison of CTypiClust Performance for Dif-

ferent T

. To investigate the impact of the param-

eter T

on the performance of CTypiClust, we eval-

uate CTy piClust using various T

values. In CTyp-

iClust, the conﬁdence CEC o f typical data x

deter-

mines whether x

is used for training. The thresh old

for this decision is T

, so we compare values from 0.5

to 0.9, excluding T

= 1 as it corresponds to Typi-

Clust. Figure 5 shows the perfo rmance differences

of CTy piClust for each T

on CIFAR-10. From Fig-

ure 5, CTypiClust performs better than Random for

all budgets from low-budget to high- budge t for any

. This is likely because, in CEC, 1

[ ˆy= ˜y]

in Equation

1 becomes 0 when the labels d o not match, a nd CEC

functions regardless of T

when CEC is 0.

Figure 5: Graph show ing the difference between CTypi-

Clust and Random for each threshold T

. Results are show n

from left to right f or low, medium, and high budgets.

Additionally, Table 3 shows the AUBC for each

. The difference between the maximum a nd mini-

mum values for each budget is 0.12% (63.35-63.23)

for low-budget, 0.71% (77.13-76.42) for medium-

budget, and 0.27% (83.62-83.35) for high-budge t, in-

dicating that CTyp iClust performs stab ly regardless of

the parameter.

Table 3: AUBC for each T

under different budgets. T he

numbers in parentheses indicate the budget size. The high-

est value f or each budget is shown in bold.

0.5 0.6 0.7 0.8 0.9

Low (10) 63.25 63.25 63.35 63.23 63.23

Medium(100) 77.13 76.92 76.53 76.48 76.42

High(1000) 83.50 83.61 83.55 83.35 83.62

4.4 Limitations

Since CEC used in CTypiClust depends on the agree-

ment between th e model’s classiﬁcation results and

the clustering results of the features, CTypiClust is

specialized for classiﬁcation problems and cannot b e

easily applied to regression tasks.

In the futu re, we aim to overcome these limitation s

and extend the method to make it applicable to various

tasks, including r egression tasks.

5 CONCLUSION

We proposed CTypiClust, which per forms highly re-

gardless of the budget. CTypiClust performs well in

both low-budget an d high-budget settings by consid-

ering conﬁdence in TypiClust. Additionally, to ad-

dress overcon ﬁdence in immature models like in low-

budget settings, w e pro posed a conﬁdence calibra-

tion method CEC a nd applied it to CTypiClust. We

evaluated CTypiClust on CIFAR-10, CIFAR-100, and

STL-10, and fo und that it performs well across var-

ious budgets. We also experimentally veriﬁed that

CEC mitig ates overconﬁdence. Sin c e CTypiClust is

specialized f or classiﬁcation problems, we p la n to

extend CTy piClust to make it applicable to various

tasks, including r egression tasks, in the future.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

346

REFERENCES

Chen, L., Bai, Y., Huang, S., Lu, Y., Wen, B., Yuille, A. ,

and Zhou, Z. (2023). Making your ﬁrst choice: To

address cold start problem in medical active learning.

In Medical Imaging with Deep Learning.

Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020).

A simple framework for contrastive learning of visual

representations. In Proceedings of the 37th Interna-

tional Conference on Machine Learning.

Coates, A., Ng, A., and Lee, H. (2011). An analysis of

single-layer networks in unsupervised feature learn-

ing. In Proceedings of the Fourteenth International

Conference on Artiﬁcial Intelligence and Statisti cs.

Gal, Y., Islam, R. , and G hahramani, Z. (2017). Deep

Bayesian active learning with i mage data. In Proceed-

ings of the 34th International Conference on Machine

Learning.

Guo, C., P leiss, G., Sun, Y., and Weinberger, K. Q. (2017).

On calibration of modern neural networks. In Pro-

ceedings of the 34th International Conference on Ma-

chine Learning.

Hacohen, G., Dekel, A., and Weinshall, D. ( 2022). Active

learning on a budget: Opposite strategies suit high and

low budgets. In Proceedings of t he 39th International

Conference on Machine Learning.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In Proceedings

of the IEEE/CVF Conference on Computer Vision and

Pattern R ecognition (CVPR).

Jaiswal, A., Babu, A. R., Zadeh, M. Z., Banerjee,

D., and Makedon, F. (2021). A survey on con-

trastive self-supervised learning. arXiv preprint

arXiv:2011.00362.

Kirsch, A., van Amersfoort, J., and Gal, Y. (2019). Batch-

bald: Efﬁcient and diverse batch acquisition for deep

bayesian active learning. In Advances in Neural Infor-

mation Processing Systems.

Krizhevsky, A. and Hinton, G. (2009). Learning multiple

layers of features from tiny images. Technical report,

University of Toronto. Online.

Laurens, v. d. M. and Hinton, G. ( 2008). Visualizing data

using t-sne. Journal of Machine Learning R esearch,

Mozafari, A. S., Gomes, H. S., Le˜ao, W., Janny, S., and

Gagn´e, C. (2019). Attended temperature scaling: A

practical approach for calibrating deep neural net-

works. arXiv preprint arXiv:1810.11586.

Munjal, P., Hayat, N., Hayat, M., Sourati, J., and Khan,

S. ( 2022). Towards robust and reproducible active

learning using neural networks. In Proceedings of the

IEEE/CVF Conference on Computer Vision and Pat-

tern Recognition (CVPR ).

Pakdaman Naeini, M., Cooper, G., and Hauskrecht, M.

(2015). Obtaining well calibrated probabilities using

bayesian binning. Proceedings of the AAAI C onfer-

ence on Artiﬁcial Intelligence, 29.

Pereyra, G., Tucker, G., Chorowski, J., Łukasz Kaiser,

and Hinton, G. (2017). Regularizing neural networks

by penalizing conﬁdent output distributions. arXiv

preprint arXiv:1701.06548.

Platt, J. (2000). Probabilistic outputs for support vector

machines and comparisons to regularized likelihood

methods. In Advances in Large Margin Classiﬁers.

Pop, R. and Fulop, P. (2018). Deep ensemble bayesian ac-

tive learning : Addressing the mode collapse issue in

monte carlo dropout via ensembles. arXiv preprint

arXiv:1811.03897.

Ren, P., Xiao, Y., Chang, X., Huang, P.-Y., Li, Z.,

Gupta, B. B., Chen, X., and Wang, X. (2021).

A survey of deep active learning. arXiv preprint

arXiv:2009.00236.

Roth, D. and Small, K. (2006). Margin-based active learn-

ing for structured output spaces. In Proceedings of the

European Conference on Machine Learning.

Sener, O. and Savarese, S. (2018). Active learning for con-

volutional neural networks: A core-set approach. In

6th International Conference on Learning Represen-

tations, ICLR 2018.

Sinha, S., Ebrahimi, S., and Darrell, T. (2019). Varia-

tional adversarial active learning. In Proceedings of

the IEEE/CVF International Conference on Computer

Vision ( I CCV).

Snell, J., Swersky, K., and Zemel, R. (2017). Prototypical

networks for few-shot learning. In Advances in Neural

Information Processing Systems.

Wang, C. (2024). Calibration in deep learning: A survey of

the state-of-the-art. arXiv preprint arXiv:2308.01222.

Yi, J. S. K., Seo, M., Park, J., and Choi, D.-G. (2022). Us-

ing self-supervised pretext tasks for active learning.

In Proceedings of the European Conference on Com-

puter Vision(ECCV).

Zhan, X., Liu, H., Li, Q., and Chan, A. B. (2021). A com-

parative survey: Benchmarking for pool-based active

learning. In Proceedings of the Thirtieth I nternational

Joint Conference on Artiﬁcial Intelligence.

Zhdanov, F. (2019). Diverse mini-batch active learning.

arXiv preprint arXiv:1901.05954.

CTypiClust: Conﬁdence-Aware Typical Clustering for Budget-Agnostic Active Learning with Conﬁdence Calibration

347