Analyse Protein Model of the SARS-CoV-2 Virus using Data Mining Methods

Tiur Gantini, Hans Christian

2021

Abstract

Since December 2019, the SARS II Covid 19 virus pandemic worldwide, The National Centre for Biotechnology Information (NCBI) has also recorded information related to this virus in its database. This research focuses on identifying dataset the protein of the species Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), genus BETACORONAVIRUS, and family CORONAVIRIDAE from NCBI database by a data mining model using a classification based naive Bayes and J48 algorithms which were recorded from December 1, 2019, to April 13, 2021, with 1.149.217 data. The dataset that has been cleaned is data of SARS II Covid 19 + virus in humans with a total record of 517.834 consisting of data on nucleotide length, nucleotide completeness, geographic location, and protein. This data used for the data training. Then we used 475 for data testing which was chosen randomly. The result is that the entire protein can be predicted using the J48 algorithm but cannot be predicted using Naive Bayes. From the data mining results, it can be concluded that the best method that can be used to predict protein in humans affected by the SARS II Covid 19 + virus is the J48 algorithm rather than the Naive Bayes algorithm.

Download


Paper Citation


in Harvard Style

Gantini T. and Christian H. (2021). Analyse Protein Model of the SARS-CoV-2 Virus using Data Mining Methods. In Proceedings of the 1st International Conference on Emerging Issues in Technology, Engineering and Science - Volume 1: ICE-TES, ISBN 978-989-758-601-9, pages 95-103. DOI: 10.5220/0010744800003113


in Bibtex Style

@conference{ice-tes21,
author={Tiur Gantini and Hans Christian},
title={Analyse Protein Model of the SARS-CoV-2 Virus using Data Mining Methods},
booktitle={Proceedings of the 1st International Conference on Emerging Issues in Technology, Engineering and Science - Volume 1: ICE-TES,},
year={2021},
pages={95-103},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010744800003113},
isbn={978-989-758-601-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Emerging Issues in Technology, Engineering and Science - Volume 1: ICE-TES,
TI - Analyse Protein Model of the SARS-CoV-2 Virus using Data Mining Methods
SN - 978-989-758-601-9
AU - Gantini T.
AU - Christian H.
PY - 2021
SP - 95
EP - 103
DO - 10.5220/0010744800003113