
weight on the base in of the total number of objects
in the dataset to optimize the object detection task.
Specifically, a smaller class weight is set for classes
with a large number of samples, while a larger class
weight is set for classes with a small number of sam-
ples, thereby placing more emphasis on classes with a
small number of samples in the loss calculation. The
implementation of the optimized class weight for the
object-detection task improves the class imbalance
problem.
In our evaluation with BDD100K dataset (Yu
et al., 2020), we demonstrate that the semi-supervised
object detection is effective for in-vehicle camera im-
ages. Also, we show that our proposed loss function
outperform the conventional supervised and semi-
supervised approaches.
2 RELATED WORK
Herein, we briefly describe related works on super-
vised object detection and semi-supervised object de-
tection methods.
2.1 Supervised Object Detection
Supervised object detection methods have been
widely studied in the field of computer vision. Among
various methods have been proposed, the supervised
method can be categorized in to the following two
types: one that uses anchor boxes (Ren et al., 2016;
He et al., 2018; Lin et al., 2018; Tan and Le, 2020)
and the other that is anchor-free (Tian et al., 2019;
Bochkovskiy et al., 2020; Tan et al., 2020; Zhou
et al., 2019). Anchor boxes-based approach are rect-
angular frames used to indicate regions where objects
may exist. Multiple anchor boxes of different sizes
and aspect ratios can be defined for each anchor on
the feature map. However, the use of anchor boxes
presents several problems, such as the existence of
multiple hyperparameters including the number of an-
chor boxes, aspect ratios, and sizes, and the fact that
most anchor boxes are treated as negative samples,
making computation inefficient.
Various anchor-free methods have been proposed
to address the disadvantages of anchor boxes fully
convolutional one-stage object detection (FCOS) uses
a unique index called center-ness instead of anchor
boxes. Center-ness is defined as follows:
centerness
∗
=
s
min(l
∗
, r
∗
)
max(l
∗
, r
∗
)
×
min(t
∗
, b
∗
)
max(t
∗
, b
∗
)
, (1)
where, l
∗
represents the distance from the object cen-
ter to the left, r
∗
represents the distance to the right,
t
∗
represents the distance to the top, and b
∗
represents
the distance to the bottom. By using center-ness, it is
possible to prevent the prediction of bounding boxes
centered on positions far from the object center.
In this paper, we use the anchor free approach (i.e.,
FCOS) as an object detector of semi-supervised learn-
ing framework.
2.2 Semi-Supervised Object Detection
object-detection methods for semi-supervised learn-
ing framework have been proposed (Sohn et al., 2020;
Xu et al., 2021; Chen et al., 2022; Liu et al., 2021; Liu
et al., 2022). Major approach of semi-supervised ob-
ject detection is pseudo-labeling.
One of the major pseudo-labeling approach is self-
training and the augmentation driven consistency reg-
ularization (STAC) and the variants (Liu et al., 2021;
Liu et al., 2022), which introduce strong augmen-
tation STAC prepares two object-detection models:
teacher and student and trains student model by us-
ing pseudo-labeling and strong data augmentation.
In this approach, teacher is trained on labeled data
only, while student is trained on both labeled and
unlabeled data. The process starts with a burn-in
stage, where teacher is trained. After this stage, the
weights of teacher are fixed, and data are input to
teacher to make predictions. Non-maximum suppres-
sion is executed to remove labels with high uncer-
tainty, and the remaining labels are treated as pseudo-
labels for student. Strong data augmentation is then
applied to data similar to those predicted by teacher,
and student makes predictions. The loss is calcu-
lated by comparing the predictions with the pseudo-
labels, and Student is trained using this loss. This
method can improve accuracy by providing a sim-
ple learning method and a large amount of unlabeled
data. However, during Student’s learning stage, the
weight of Teacher is fixed, which means that the per-
formance heavily depends on how accurate Teacher
can be trained during the burn-in stage.
The method called Unbiased Teacher (Liu et al.,
2021) is used for improving the dependency issue dur-
ing the burn-in stage. Unbiased Teacher updates the
weights of Teacher on the basis of the exponential
moving average using Student’s weights, even after
the burn-in stage, which enables the feedback of Stu-
dent’s learned knowledge to Teacher. The updating
formula for Teacher’s weights using the exponential
moving average is shown in Equation 2.
θ
t
= αθ
t
+ (1 − α)θ
s
(2)
where, θ
t
represents Teacher’s weights, θ
s
represents
Student’s weights, and α is a hyperparameter. By
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
418