Figure 4. structure comparison of FPN, Panet, Bi FPN.
Compared with Panet (Polar-Agency Network),
BiFPN (BiFPN) with weighted bidirectional features
has a different node join pattern from Panet (Panel
Agency Network), the optimization methods of the
cross-scale join include:
(1) deleting the unique input nodes in the PANET
-LRB-Path Agency Network). Because there is no
node with fusion characteristic, the nodes of p 3 and
P 6 are eliminated, and a small simplified binary
network is obtained.
(2) at the same scale, the frequency-hopping
connection between the input and output nodes is
increased, so that the frequency-hopping connection
on the same feature layer can be fused at more levels
with limited computation.
(3) unlike Panet (Patholic Agency Network) ,
which has only one top-down and one bottom-up
feature channels, Bi-FPN (weighted bidirectional
feature cone) treats each bidirectional channel as a
feature Network layer, and through repeated
processing of this layer features, thus achieving a
higher dimension of feature fusion.
Swin-Transformer improves the prediction head
based on Swin Transformer encoder. Swin-
transformer replaces the moving window with the
moving window, performs self-attention computation
on the non-overlapping local feature layer, and
completes the neighbor feature aggregation by using
the method of layer connectivity.
In the field of object detection, due to
Transformer's dependence on high-resolution images,
its attention complexity is about the square of image
size. On this basis, a sparse representation method
based on multi-scale features is proposed.
SWINTRANSFORMER fuses adjacent smaller
image blocks to create a hierarchical feature map for
deep mining. When the number of image blocks in
each feature layer is constant, the computational
complexity is linear with the image size.
This method makes use of the common
hierarchical construction method in convolutional
neural network and the concept of image region to
realize the self-attention computation of inconsistent
image window. Compared to the convolution process
in convolutional neural network (CNN), Swin
Transformer performs a convolution on each window
to get a window's properties, while Swin Transformer
performs a self-focusing calculation on each window,
a new window is obtained, and then the new window
is fused once, and then the fused window is fused
once.
In this model, the traditional long-term attention
mode (MSA) is transformed into a moving window
mode. Swin converter consists of a sliding window-
based Multilayer perceptron (MSA), which connects
two different types of Multilayer perceptron (mlps) in
series.
Instead of the Swin Transformer framework, the
traditional Transformer framework needs to perform
global self-attention computation on the image, which
consumes a lot of computing resources, and it needs
to divide the image into m × m non-overlapping
blocks, on this basis, the computational complexity of
global-based MSA and moving window-based W-
sma are:
Ω
𝑀𝑆𝐴
=4hwC
+2
ℎ𝑤
C
4
Ω
𝑊𝑀𝑆𝐴
=4hwC
+2MhwC
5
From formula (4)(5) , we can see that the
operation complexity of MSA is the square of the
number of image blocks HW, the operation
complexity of W-sma based on moving window is
linear with the number of image blocks.
3 SUMMARY
With the wide application of deep learning and
machine vision, transmission line inspection is
changing from traditional manual inspection to
intelligent inspection. In this paper, target detection
and fault identification in transmission line inspection
are studied, and the task of small target detection and
fault identification in transmission line inspection is
studied. On this basis, it is improved by using
converter, sven converter, weighted bidirectional
characteristic pyramid, and convolutional attention
model, in this paper, we extend the defective samples
by using saliency map, and adopt the method of
enhanced feature pyramid and deep semantic
embedding.
ACKNOWLEDGEMENTS
This work was supported by the National Key
Research and Development Program of China under
Grant 2020AAA0107500.