Loss function and model training process this
paper uses a dual-card
NVIDIAGEFORCERTX3060TIGPU hardware
environment, using Python 3.7, Pytorch1.9.0,
Torchvision0.10.0, CUDA11.1, Cudnn8.0.5.
The loss function differs from the image
classification of machine vision in that object
detection requires not only the classification of
objects, but also the regression of coordinates of
objects in rectangular position boxes. On this basis,
two parallel output layers are used to implement the
corresponding output variables. The output of the
first output layer is a discrete category possibility
confidence, where p=(p0,p1,…, pK) , corresponding
to the K categories, there are (K+1 ) outputs,
which include the confidence of the K categories
and the confidence that the proposal belongs to the
background. In this case, confidence p is the(K+1)
FC layer output of the (K+1) FC layer obtained by
the software maximum, which is the offset of the
destination position boundary box. For class K
targets, the deviation is t
=t
,t
,t
,t
, where t
does not refer to the absolute position coordinates of
the regression target boundary box, but to the
corresponding position coordinates generated by
RPN.
Each trained RoI (area boundary) has a true
classification marker u and a true locator box
regression vector g. For the object detection task
described above, we use the multi-task loss function,
with RoI as the unit, to train the classification marks
and the border regression:
L=L
+L
1
L
𝑚, 𝑢, 𝑝
, 𝑔
=L
𝑚, 𝑢
+ λ
u ≥ 1
L
p
,g
2
L
𝑚, 𝑢
= −log 𝑚
is the log loss for the
category label u.
For the regression loss of the boundary box, it
was defined as: the truth variable g=(g
x
, g
y
, g
w
, g
h
)
defining the target category U boundary box and the
predicted boundary box location variable p
=
(p
,p
,p
,p
)
𝑢≥1
=
1, 𝑢≥1
0, 𝑜𝑡ℎ𝑒𝑟𝑖𝑠𝑒
(
3
)
Where u is the result of the goal category
prediction, and U = 0 is the goal framed by the
proposal in the training sample that does not support
the goal category in the set, but rather the
background, that is, a deviation has occurred in the
first classification task, so the regression error here
is meaningless.
2.2 in the process of model training, the
commonly used indexes to measure the training
effect include Precision, Recall, AP and mAP. TP
was defined as a positive sample for correct
prediction, FN as a negative sample for false
prediction, FP as a positive sample for false
prediction, and the above assessment measures were
defined as:
Prcesion =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
#
(
4
)
Recall =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
#
(
5
)
AP =
∑
𝑝𝑖
𝑛
(
6
)
mAP =
∑
𝐴𝑃
𝑘
(
7
)
3
CONCLUSION
The learning process of twin networks is divided
into two stages. In the pre-training phase, we will
improve the model's ability of multi-class image
recognition by repeated training based on the
existing standard large sample. In this paper, it is
evaluated by using the existing mAP evaluation
criteria, and it is pre-trained to reach 24.0 in the
COCO data set. In the second step, the parameters of
neural network are adjusted according to EDP data.
The basic structure of ResNet is composed of four
layers. In the retraining stage, the weights of the first
two layers of ResNet are frozen, and the information
of EDP data set is utilized, the latter two layers and
the full-connection classification layer are adjusted
to migrate the dataset.
The improved RPN algorithm is used to
recognize the small sample of twin networks. In this
method, the features of the supporting image set are
extracted by dual-core network, and the object-
related features are generated. There is still a big gap
between the twin-sub-network small sample model
proposed in this project and the existing large data
object detection technology. In the future, we will
further explore how to eliminate the interference of
complex environment, explore new model
evaluation methods, and how to make up the
difference between training samples and test
samples.
ANIT 2023 - The International Seminar on Artificial Intelligence, Networking and Information Technology