
4.2.2 Running Time Analysis
The running frame rates of different VSR models dur-
ing inference stage will be presented in this part. The
experimental results are shown in Table 2. The sec-
ond column lists the parameters of each VSR model
and column 5 counts the statistics of correspond-
ing computation cost. The total computation cost
required by our REPVSR during inference time is
only 31.17% of VESPCN, 16.71% of SOFVSR, and
10.58% of FRVSR and TecoGAN. Not to mention
REPVSR, which has a very large parameter of 338.5G
FLOPS. The last columns illustrate the average FPS
in different resolutions, When generating 1080p def-
inition video, the proposed method can run in real
time on NVIDIA Geforce GTX 1080 level graphics
cards. Due to the implementation of structural re-
parameterization, our REPVSR model runs two times
and even much more faster on GPU platform com-
pared with other deep models.
5 CONCLUSION
In this paper, we design a recurrent VSR net-
work based on re-parameterization (REPVSR) to re-
parameterize models with a multi-branch design. The
positive results show favorable speed-accuracy trade-
off compared to existing VSR models. In the future,
we aim to embed re-parameterization mechanism to
other efficient VSR architecture.
REFERENCES
Caballero, J., Ledig, C., Aitken, A., Acosta, A., Totz, J.,
Wang, Z., and Shi, W. (2017). Real-time video super-
resolution with spatio-temporal networks and motion
compensation. In Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition,
pages 4778–4787.
Chan, K. C., Wang, X., Yu, K., Dong, C., and Loy, C. C.
(2021). Basicvsr: The search for essential components
in video super-resolution and beyond. In Proceedings
of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 4947–4956.
Chan, K. C., Zhou, S., Xu, X., and Loy, C. C. (2022). Ba-
sicvsr++: Improving video super-resolution with en-
hanced propagation and alignment. In Proceedings
of the IEEE/CVF conference on computer vision and
pattern recognition, pages 5972–5981.
Chu, M., Xie, Y., Mayer, J., Leal-Taix
´
e, L., and Thuerey,
N. (2020). Learning temporal coherence via self-
supervision for gan-based video generation. ACM
Transactions on Graphics (TOG), 39(4):75–1.
Ding, X., Chen, H., Zhang, X., Huang, K., Han, J.,
and Ding, G. (2022). Re-parameterizing your op-
timizers rather than architectures. arXiv preprint
arXiv:2205.15242.
Ding, X., Guo, Y., Ding, G., and Han, J. (2019). Acnet:
Strengthening the kernel skeletons for powerful cnn
via asymmetric convolution blocks. In Proceedings of
the IEEE/CVF International Conference on Computer
Vision, pages 1911–1920.
Ding, X., Zhang, X., Han, J., and Ding, G. (2021a). Diverse
branch block: Building a convolution as an inception-
like unit. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
10886–10895.
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., and Sun,
J. (2021b). Repvgg: Making vgg-style convnets great
again. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
13733–13742.
Fuoli, D., Danelljan, M., Timofte, R., and Van Gool,
L. (2023). Fast online video super-resolution with
deformable attention pyramid. In Proceedings of
the IEEE/CVF Winter Conference on Applications of
Computer Vision, pages 1735–1744.
Haris, M., Shakhnarovich, G., and Ukita, N. (2019).
Recurrent back-projection network for video super-
resolution. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition,
pages 3897–3906.
Jian, S., Xu, Z., and Shum, H. Y. (2008). Image super-
resolution using gradient profile prior. In 2008 IEEE
Computer Society Conference on Computer Vision
and Pattern Recognition (CVPR 2008), 24-26 June
2008, Anchorage, Alaska, USA.
Liu, H., Zhao, P., Ruan, Z., Shang, F., and Liu, Y. (2021).
Large motion video super-resolution with dual subnet
and multi-stage communicated upsampling. In Pro-
ceedings of the AAAI conference on artificial intelli-
gence, volume 35, pages 2127–2135.
Marnerides, D., Bashford-Rogers, T., Hatchett, J., and De-
battista, K. (2018). Expandnet: A deep convolu-
tional neural network for high dynamic range expan-
sion from low dynamic range content. In Computer
Graphics Forum, volume 37, pages 37–49. Wiley On-
line Library.
Sajjadi, M. S., Scholkopf, B., and Hirsch, M. (2017). En-
hancenet: Single image super-resolution through au-
tomated texture synthesis. In Proceedings of the IEEE
International Conference on Computer Vision, pages
4491–4500.
Sajjadi, M. S., Vemulapalli, R., and Brown, M. (2018).
Frame-recurrent video super-resolution. In Proceed-
ings of the IEEE Conference on Computer Vision and
Pattern Recognition, pages 6626–6634.
Tian, Y., Zhang, Y., Fu, Y., and Xu, C. (2020). Tdan:
Temporally-deformable alignment network for video
super-resolution. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recog-
nition, pages 3360–3369.
Wang, L., Guo, Y., Liu, L., Lin, Z., Deng, X., and An, W.
(2020). Deep video super-resolution using hr optical
REPVSR: Efficient Video Super-Resolution via Structural Re-Parameterization
545