The Investigation of Packet Header Field Importance on Malware Classification Following Nprint Processing

Fangzhou Xing

2023

Abstract

In 2021, a research endeavor aimed to standardize and automate the utilization of machine learning in network traffic analysis through the introduction of Nprint. Nprint converts complete packets into binary representation (1s, 0s, and -1s), subsequently feeding the processed data into an autoML system. This study demonstrated remarkable performance across various network traffic analysis tasks, including malware classification. However, it did not investigate the impact of excluding certain packet header fields on the results. Consequently, this research seeks to explore how the utilization of Nprint for data processing, while selectively considering specific packet header fields, influences the outcome of the malware classification task. This research used random forest on Nprint processed network traffics to determine the importances of each header field on the task of malware classification, and then tried using only the information of top n most important header fields as the data to be fed into AutoGluon to determine how the classification accuracy and the training time would be changed. The research had found that using only 3 of the packet header fields could still achieve an accuracy that was 99.9% of the accuracy achieved by using all the header fields, and at the same time shortened the training time required for the best performing modal on this task given by an AutoGluon by more than half.

Download


Paper Citation


in Harvard Style

Xing F. (2023). The Investigation of Packet Header Field Importance on Malware Classification Following Nprint Processing. In Proceedings of the 1st International Conference on Data Analysis and Machine Learning - Volume 1: DAML; ISBN 978-989-758-705-4, SciTePress, pages 343-348. DOI: 10.5220/0012808500003885


in Bibtex Style

@conference{daml23,
author={Fangzhou Xing},
title={The Investigation of Packet Header Field Importance on Malware Classification Following Nprint Processing},
booktitle={Proceedings of the 1st International Conference on Data Analysis and Machine Learning - Volume 1: DAML},
year={2023},
pages={343-348},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012808500003885},
isbn={978-989-758-705-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Data Analysis and Machine Learning - Volume 1: DAML
TI - The Investigation of Packet Header Field Importance on Malware Classification Following Nprint Processing
SN - 978-989-758-705-4
AU - Xing F.
PY - 2023
SP - 343
EP - 348
DO - 10.5220/0012808500003885
PB - SciTePress