Authors:
Ramin Fuladi
1
and
Khadija Hanifi
2
Affiliations:
1
Ericsson Research, Istanbul, Turkey
;
2
Sabanci University, Istanbul, Turkey
Keyword(s):
Software Vulnerability Prediction, CodeGrapher, ML Algorithms, Semantic Relations, Source Code Analysis, Similarity Distance Metrics, Image Generation.
Abstract:
Contemporary software systems face a severe threat from vulnerabilities, prompting exploration of innovative solutions. Machine Learning (ML) algorithms have emerged as promising tools for predicting software vulnerabilities. However, the diverse sizes of source codes pose a significant obstacle, resulting in varied numerical vector sizes. This diversity disrupts the uniformity needed for ML models, causing information loss, increased false positives, and false negatives, diminishing vulnerability analysis accuracy. In response, we propose CodeGrapher, preserving semantic relations within source code during vulnerability prediction. Our approach involves converting numerical vector representations into image sets for ML input, incorporating similarity distance metrics to maintain vital code relationships. Using Abstract Syntax Tree (AST) representation and skip-gram embedding for numerical vector conversion, CodeGrapher demonstrates potential to significantly enhance prediction accur
acy. Leveraging image scalability and resizability addresses challenges from varying numerical vector sizes in ML-based vulnerability prediction. By converting input vectors to images with a set size, CodeGrapher preserves semantic relations, promising improved software security and resilient systems.
(More)