3 RESULT AND DISCUSSION
The results of protein analysis can be seen from Table
5. The protein summary prediction model uses the J48
Algorithm. We can see that there are 12 proteins
predicted using the J48 algorithm, namely Envelope
protein, membrane Glycoprotein, Nucleocapsid
phosphoprotein, ORF1a polyprotein, ORF1ab
protein, ORF3a protein, ORF6 protein, ORF7a
protein, ORF7b protein, ORF8 protein, ORF10
protein, and Surface glycoproteins. From the 12
proteins, not all proteins were predicted correctly. For
ORF3a protein and ORF7a protein, not all were
successfully predicted. From the 42 ORF3a proteins,
1 protein could not predictable. And from the 57
ORF7a proteins, there were 15 proteins that could not
be predicted. The summary of 12 protein predictions
using the J48 algorithm were 496 proteins which
correct predictions.
It can also be seen from Table 6. Summarizing the
predictions of the protein model using the Naïve
Bayes Algorithm. We can see that there are 11
proteins that were predicted using the Naïve Bayes
Algorithm are proteins that were also predicted by the
J48 algorithm. There is only 1 protein that cannot be
predicted using Naive Bayes, namely ORF7a protein.
And the ORF8 protein was not predictable at all, there
were 42 unpredictable proteins out of 59 proteins.
And if recap, obtained 433 proteins from the 11 types
of proteins that were successfully predicted using the
Naive Bayes algorithm.
The result of the data testing prediction shows in
Table 7. The J48 algorithm produces a lot of correct
predictive data like the original data than the Naïve
Bayes Algorithm. The ORF7a protein can be
predicted using the J48 algorithm with many false
predictions but cannot be predicted using the Naïve
Bayes algorithm. The comparison between the two
algorithms summarizes in Table 7.
Table 8: Result of Data Testing Prediction.
RESULT
ALGORITHM
J48 NB
TRUE 459 433
FALSE 16 42
4 CONCLUSIONS
The conclusion of this study is that to analyse the
SARC-COV2 protein, data mining methods can be
used by applying the J48 and Naive Bayes algorithms.
When comparing the two algorithms, it can be
proposed using the J48 algorithm, because all
proteins can be predicted even though there are still
prediction errors for the ORF1a protein. Meanwhile,
when using the Naive Bayes algorithm, the ORF1a
protein cannot be predicted at all. For future research,
the J48 algorithm can be compared with other
algorithms besides Naive Bayes. It will hope that the
predictions can get even better results.
ACKNOWLEDGEMENTS
This paper and the research behind it would not have
been possible without the extraordinary support of the
Information Systems study program, the Faculty of
Information Technology, and the ranks of the
Maranatha Christian University LPPM Bandung.
Besides that, also thanks to the head of the Maranatha
Christian University library who always supports the
literature needs and checks the plagiarism of this
research. Thank you also to my husband, my
daughter, and also Hans Adrian, my student as my
partner who accompanied me in carrying out the
research and also wrote the journal until it was
completed well.
REFERENCES
Albahri, A. S., Hamid, R. A., Alwan, J. k., Al-qays, Z. T.,
Zaidan, A. A., Zaidan, B. B., Albahri, A. O. S.,
AlAmoodi, A. H., Khlaf, J. M., Almahdi, E. M., Thabet,
E., Hadi, S. M., Mohammed, K. I., Alsalem, M. A., Al-
Obaidi, J. R., & Madhloom, H. T. (2020). Role of
biological Data Mining and Machine Learning
Techniques in Detecting and Diagnosing the Novel
Coronavirus (COVID-19): A Systematic Review.
Journal of Medical Systems, 44(7).
https://doi.org/10.1007/s10916-020-01582-x
Baker, R. S. J. d. (2010). Data mining. International
Encyclopedia of Education, 112–118.
https://doi.org/10.1016/B978-0-08-044894-7.01318-X
Biswas, A., Bhattacharjee, U., Chakrabarti, A. K., Tewari,
D. N., Banu, H., & Dutta, S. (2020). Emergence of
Novel Coronavirus and COVID-19: whether to stay or
die out? Critical Reviews in Microbiology, 46(2), 182–
193. https://doi.org/10.1080/1040841X.2020.1739001
Christian, H. (2021). ANALISIS AKURASI DATA
PROTEIN VIRUS SARS-COV-2 DENGAN
MENGGUNAKAN METODE DATA MINING.
Christopher P. Austin, M. D. (2014). Bioinformatics.
Usa.Gov. https://www.genome.gov/genetics-
glossary/Bioinformatics
Information, N. C. for B. (2019). Database sars-cov-2.
National Center for Biotechnology Informationter for