tures from different areas of a file were needed. After
thorough research, a final selection of four tools was
made:
1. Pefile (pef, 2024) is a Python library used for pars-
ing and analyzing portable executable files. This tool
can extract information like PE headers, PE sections,
and imported and exported symbols and can be con-
sidered the base of the four.
2. Flare-Floss (fla, 2024a) is designed to automat-
ically extract and deobfuscate strings from malware
binaries utilizing advanced static analysis techniques.
It is similar to the traditional Linux ”strings” com-
mand but additionally can handle obfuscation, an ap-
proach commonly used by ransomware authors to
hide the true intentions and functionality of their pro-
gram.
3. Exiftool (exi, 2024) is a command-line utility used
to get meta information about a file. Unlike the other
three tools, which specialize in portable executables,
this one accepts files of any type. This command was
used together with ”-n” option to output numeric val-
ues without formatting, facilitating smoother parsing.
4. Dependencies (dep, 2024) is a modern and faster
rewrite of Dependency Walker, available as open-
source software. This tool focuses on the extraction
of all Dynamic Link Libraries (DLLs) that a program
depends on.
To optimize the static analysis process, a script was
developed to concurrently execute four threads. For
every file in a dictionary, each thread executes one of
the tools specified, parses its output and saves the in-
formation to a global dictionary. Depending on the
tool, the value can indicate various aspects such as
the number of times the key appears in the program,
as in the case of floss output, the presence, with a
value of 1 or an actual integer or float value. Fur-
thermore, the script generates a single CSV file with
exactly one row for each sample, writing the keys of
the dictionary as columns and the values are placed
in the corresponding first row. This approach ensures
that no information is lost for already analyzed sam-
ples in case of an error. Given that Floss and Depen-
dencies tools may require longer computational time
to correctly analyze a file, a timeout mechanism was
implemented. The script will wait a maximum of four
minutes for the Floss process to complete and eight
minutes for Dependencies, meaning that samples that
take longer will be dropped.
3.5 Dynamic Extraction
An approach that is straightforward, always available,
and easy for users to install was desired for the dy-
namic analysis. For these reasons, Cuckoo Sandbox,
a popular open-source automated malware analysis
system that executes samples in a controlled environ-
ment, was excluded. Moreover, Cuckoo can be con-
sidered outdated nowadays, as it only supports Python
2 and Ubuntu 18.04. Subsequently, API approaches
were researched because such a solution would meet
the proposed requirements. Among the limited op-
tions, the two best candidates were selected: VirusTo-
tal and Hybrid-Analysis, from which only one should
be chosen. Both alternatives have similar functional-
ities, providing endpoints to submit a file and get its
behavioral report. The API calls can be utilized in a
script to extract the dynamic features of a sample.
VirusTotal is primarily known for aggregating
multiple antivirus engines to scan a given file con-
currently, with each engine determining whether it
is malicious or safe. The documentation states that
the submitted samples are automatically executed in a
sandboxed environment with their behavior recorded.
However, the dynamic report is available instantly af-
ter submitting a sample, which raises suspicions that
the sample may not actually be run in a controller
environment. Additionally, the reports show incon-
sistency in the information provided, with some sam-
ples executing in multiple sandboxes and thus offer-
ing more details while others executing in only one.
No documentation was found regarding how Virus-
Total decides which sandboxes to run the sample in,
or how the dynamic report is available instantly. For
these reasons, Hybrid-Analysis was chosen.
Hybrid-Analysis (hyb, 2024) offers a Falcon
Sandbox public API with various endpoints, though
a free account has restricted access to them. Per-
missions are granted for the essential ones, allowing
the upload of files for analysis and fetching the re-
port summary of a sample. Other endpoints, such as
those retrieving the extracted binaries files or memory
dumps, which would’ve provided additional informa-
tion, could not be used. Another notable limitation is
that the API only permits 100 daily file submissions,
thus slowing down the analysis process.
Instead of submitting one sample at a time and
waiting for the behavior analysis to be completed, a
script was employed to submit 100 samples simulta-
neously, taking advantage of Hybrid-Analysis’s abil-
ity to process submissions in parallel. The scrip
makes API calls to the ’/submit/file’ endpoint with
the following supplementary input parameters: en-
vironment id was set to 160, specifying the operat-
ing system of the sandbox, in this case meaning Win-
dows 10 64 bit; experimental anti evasion was set
to true, this applies experimental techniques to pre-
vent malware evasion tactics that detect sandbox en-
vironment and avoid execution; script logging was
WEBIST 2024 - 20th International Conference on Web Information Systems and Technologies
412