Deep Learning Methods for Video to Text Converter Applications with Phytorch Library
Komang Indah, Ida Manuaba, I. Komang Wiratama
2022
Abstract
One of the core elements of the concept of computer vision consists of image classification, object detection and segmentation. The multi-task deep learning method is implemented in the process of converting images into text through a multi-layer process to express complex image understanding. In this study, the system will convert the video into text and sentences. This synchronization is based on the Multitask Deep Learning method that combines the Convolutional Neural Network (CNN) system in the image area, Recurrent Neural Network (RNN) with LSTM (Long Short Term Memory) in the sentence area, CCN (Caption Content Network) and RCN (Recurrent Convolutional Network) on the labeling process and the relationship between objects as well as with a structured goal that aligns the two modalities through multimodal embedding. PyTorch is an extension of the Torch Framework which was originally written in the Lua programming language. The syntax that PyTorch uses is not much different in terms of functionality compared to other frameworks. Testing the results of converting images into text based on Multi Task Deep Learning with the RNN method using LSTM or BERT with scoring using f1-score, precision and recall. The results will be plotted using AUC (Area Under The Curve) and ROC (Receivers Operating Characteristics)graphs.
DownloadPaper Citation
in Harvard Style
Indah K., Manuaba I. and Komang Wiratama I. (2022). Deep Learning Methods for Video to Text Converter Applications with Phytorch Library. In Proceedings of the 5th International Conference on Applied Science and Technology on Engineering Science - Volume 1: iCAST-ES; ISBN 978-989-758-619-4, SciTePress, pages 42-49. DOI: 10.5220/0011711300003575
in Bibtex Style
@conference{icast-es22,
author={Komang Indah and Ida Manuaba and I. Komang Wiratama},
title={Deep Learning Methods for Video to Text Converter Applications with Phytorch Library},
booktitle={Proceedings of the 5th International Conference on Applied Science and Technology on Engineering Science - Volume 1: iCAST-ES},
year={2022},
pages={42-49},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011711300003575},
isbn={978-989-758-619-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 5th International Conference on Applied Science and Technology on Engineering Science - Volume 1: iCAST-ES
TI - Deep Learning Methods for Video to Text Converter Applications with Phytorch Library
SN - 978-989-758-619-4
AU - Indah K.
AU - Manuaba I.
AU - Komang Wiratama I.
PY - 2022
SP - 42
EP - 49
DO - 10.5220/0011711300003575
PB - SciTePress