Intra-Vehicular Network Security Datasets Evaluation
Achref Haddaji
1 a
, Samiha Ayed
2
and Lamia Chaari Fourati
3
1
National School of Electronics and Telecommunications of Sfax, Tunisia
2
LIST3N-ERA, University of Technology of Troyes, France
3
Digital Research Center of Sfax (CRNS), SM@RTS (Laboratory of Signals, systeMs, aRtificial Intelligence and neTworkS),
Sfax University, Tunisia
Keywords:
Vehicular Networks, Intra-Vehicular Networks, Security, Datasets, Cyber-Attacks, Artificial Intelligence.
Abstract:
Vehicular networks are more and more connected to the outside world. Therefore they became highly vulner-
able to different cyber-attacks by being an easy target. Consequently, intra-vehicular networks’ cybersecurity
risk is raised too. As a solution, Artificial Intelligence (AI) based solutions were proposed to overcome these
issues. On the other hand, their effectiveness relies mainly on the existing sources and datasets to ensure
the networks’ security. However, there is a significant challenge to overcome: the studies of the existing
datasets of intra-vehicular network security. To tackle this issue, this paper examines and assesses existing
intra-vehicular network security datasets. In addition, we comprehensively provide a detailed resource on the
existing datasets and elaborate a comparative study. This paper also presents outstanding research discussions
on dataset preprocessing, usability, and strength points to guide and help researchers.
1 INTRODUCTION
1.1 General Context
In the last decade, the rapid adoption of intelligent
vehicles (Shokravi et al., 2020), also known as con-
nected vehicles, has revolutionized their networks and
security. These advanced vehicles employ sophis-
ticated technologies based on Artificial Intelligence
(AI) (Haddaji et al., 2022) that cooperate with in-
telligent vehicle components (e.g., sensors). AI in-
terferes with different tasks, such as communicating
with other vehicles, infrastructure, and the internet,
enhancing user safety, comfort, and performance effi-
ciency. However, connected vehicles’ advancement in
the automotive environment has been returned explic-
itly to In-vehicle networks (Rajapaksha et al., 2023).
Intelligent cars rely heavily on in-vehicle networks
where many functions linked to sensors and proces-
sors within the vehicle are used. They enable var-
ious electronic systems and control components to
communicate and exchange data. These data include
features like adaptive cruise control, lane departure
warning, and blind spot detection. As in-vehicle net-
works become more intricate and integrated, cyber-
a
https://orcid.org/0000-0002-0388-9840
attackers have a lot of entry points. Vulnerabilities
can be exploited via the vehicle’s systems and the va-
riety of interactions between them, such as the CAN
bus (Jichici et al., 2022), the primary communication
channel by the majority of in-vehicle systems. An ad-
versary who obtains access to the CAN bus may be
able to manipulate the data sent between the various
vehicle systems, causing the vehicle to behave errat-
ically or even become uncontrollable. The vehicle’s
wireless interfaces, such as Bluetooth, Wi-Fi, or cel-
lular networks, are potential attack vectors. An at-
tacker with access to these interfaces may be able to
implement remote attacks, such as injecting malicious
code or commands into the vehicle’s systems. In ad-
dition, physical access to the car, such as through the
diagnostic port or other external interfaces, can facil-
itate attacks. Denial-of-service (DoS) attacks (Shah
et al., 2022), remote code execution, and physical at-
tacks (Duo et al., 2022) that enable an attacker to con-
trol the vehicle’s steering, stopping, or acceleration
are examples of attacks demonstrated on in-vehicle
networks. Overcoming these issues, protecting the
safety and privacy of intelligent vehicles and their oc-
cupants, and preventing cyber-attacks require ensur-
ing the security of in-vehicle networks.
Haddaji, A., Ayed, S. and Fourati, L.
Intra-Vehicular Network Security Datasets Evaluation.
DOI: 10.5220/0012131200003546
In Proceedings of the 13th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH 2023), pages 401-408
ISBN: 978-989-758-668-2; ISSN: 2184-2841
Copyright
c
2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
401
Telematic DomainInfotainment Domain
Chass
Body DomainSensor DomainPowertrain Domain
CAN/LIN/Automotive
Ethernet
Media Oriented
System Transport
(MOST)
CAN
CAN/LIN
CAN/LINCAN/LIN
utomotive Protocols
ore Units
Remote Access Vehicle
Collision Notification
Speed Control
Emergency Calling
Vehicle Diagnostics
Maintenance Notification
Vehicle Location (GPS)
GPS Navigation
Video/Audio Control
Streamin
USB
Bluetooth
Wi-Fi
Steering Control Unit
Suspension Control Unit
Braking Control Unit
Pressure Monitoring System
Ultrasonic Sensor
Climate Control
Transmission Control Unit
Battery Status Monitoring Unit
Gear Box Control Unit
Operation Control Unit
Engine Operation Control Unit
Energy Control Unit
ontrol Units
Figure 1: Intra-Vehicle Network Architecture: Automotive Protocols and Units.
1.2 Motivation and Problematic
Since in-vehicle networks are vulnerable to both in-
ternal and external attacks, it is crucial to have robust
security measures in place to protect these systems
from cyber risks. To address these issues, experts
in the area of vehicular security have created many
techniques and strategies for in-vehicle network secu-
rity based on AI. AI-based solutions establish differ-
ent Machine Learning (ML) and Deep Learning (DL)
algorithms to track and analyze network traffic, iden-
tify abnormalities and suspicious behavior, and in-
stantly respond to threats. Specifically, researchers
showed considerable interest in intra-vehicular net-
works attacks detection. Moreover, recent advance-
ments in AI could assist the vehicle by identifying
and repairing the systems and network flaws before
attackers can take advantage of them. However, AI
solutions development and innovation are related to
the available resources (e.g., simulations, datasets,
and experiment information). Meanwhile, there is a
considerable need to have more resources to validate
these approaches. Therefore, the need for available
datasets and open resources represents a big chal-
lenge for vehicular network security (intra-vehicular
networks specifically). Therefore, this challenge cre-
ated a need for research studies to assess and survey
public datasets for intra-vehicular network security.
Indeed, a limited number of studies concentrate on
vehicular network security datasets (intra-vehicular
datasets). This fact might significantly affect and de-
crease the efficacy of security solutions. To tackle
this challenge, the primary objective of this paper is
to address, review and analyze the currently avail-
able datasets in vehicular network security. In addi-
tion, the value is this work is represented by provid-
ing a comprehensive exposition of the different exist-
ing datasets utilized in AI-based solutions to enhance
vehicular communication security.
1.3 Contributions and Outline
This paper includes a more in-depth exploration of
intra-vehicular network security datasets. Therefore,
the major contributions are as follows:
Present an overview of intra-vehicular networks
principles, protocols, and security issues.
Assess and evaluate the available intra-vehicular
networks security datasets.
Highlight the preprocessing phase and its major
steps and characteristics.
Discuss the available datasets’ norms or usage,
benefits, and limitations.
The remainder of this paper is organized as fol-
lows: First, section 2 presents an overview of intra-
vehicular networks. Then, Section 3 list and review
the existing intra-vehicular network security datasets.
Section 4 provides a discussion and identifies the po-
tential of each dataset, followed by a conclusion in
Section 5.
SIMULTECH 2023 - 13th International Conference on Simulation and Modeling Methodologies, Technologies and Applications
402
Remote & Physical Access Attacks Physical AccessAttacks
Remote Access Attacks Sensor Attacks
Security Threats to Automotive Protocols
Local Interconnect
Network (LIN)
FlexRay
Media Oriented Systems
Transport(MOST)
Ethernet
Controller Area
Network(CAN)
Attack Denial of Service(DoS) Attack
Masquerading Attack
Injection Attack
Eavesdropping Attack
Replay Attack
Masquerading Attack
Injection Attack
Eavesdropping Attack
Replay Attack
Spoofing Attack
Collision Attack
Response Collision Attack
Integrity Attack
Confidentiality Attack
Network Access Attack
Denial of Service(DoS) Attack
Jamming Attack
Synchronization Disruption Attack
Figure 2: Classification of Possible Security Threats: Entry Surfaces and Automotive Protocols.
2 INTRA-VEHICULAR
NETWORKS: OVERVIEW
This section presents a brief background knowledge
about Intra-Vehicular Networks and their various re-
lated security concerns.
2.1 Intra-Vehicular Networks
Preliminaries
It consists of several core components (e.g., gateways,
sensors, actuators, etc.) distributed in various units,
namely the sensor domain, chassis domain, telemat-
ics domain, powertrain domain, etc. the communica-
tion between these components is based on the usage
of protocols that play a major role. As described in
(Rathore et al., 2022), there are three major classifi-
cation types of intra-vehicular architectures. The first
type is based on the central gateway known as dis-
tributed electrical and electronics (E/E). Meanwhile,
the second architecture is multiple operational do-
mains linked through a central gateway, known as the
domain-centralized electricals and electronics(E/E).
The last classification type is known as future E/E ar-
chitecture or zonal architecture. It consists of a cen-
tralized high-performance computing unit (HPCU)
that aims to reduce the complexity of previously exist-
ing two architectures. Figure 1 illustrated the general
architecture of intra-vehicular networks.
2.2 Intra-Vehicular Networks
Automotive Protocols
Connected vehicles are equipped with an advanced
sensor platform capable of transmitting a high num-
ber of signals internally. This sensor data is pro-
cessed by approximately 70 Electronic Control Units
(ECUs) interconnected within the vehicle. The intra-
vehicular network enables the exchange of data be-
tween sensors, ECUs, and actuators, which is crucial
for the vehicle’s proper functioning. Therefore, the
primary communication systems involve substantial
use of five intra-vehicular networks protocols (Aksu
and Aydin, 2022): (1) Local Interconnection Net-
work (LIN), (2) Controller Area Network (CAN), (3)
FlexRay, (4) Ethernet, and (5) Media Oriented Sys-
tems Transport (MOST). Each protocol has advan-
tages and disadvantages (See Table 1). For example,
LIN offers a low communication speed and is suited
for applications that do not demand precise time per-
formance, such as battery monitoring and window ac-
tuator control. In addition, LIN has a limited fault-
tolerance capability. On the other hand, FlexRay, Eth-
ernet, and MOST offer greater bandwidth than LIN,
making them ideal for time-sensitive and bandwidth-
intensive applications. FlexRay, for instance, is em-
ployed in safety systems such as steering angle sen-
sors and safety radar. In contrast, Ethernet and most
are commonly utilized in the infotainment system and
ECU flash interface. Due to its low cost, mature tool
networks, and acceptable noise-resistance and defect
tolerance performance, CAN is the most popular net-
work due to its low cost, proper noise-resistance per-
formance, and fault tolerance (Al-Jarrah et al., 2019).
2.3 Intra-Vehicular Networks Security
Issues
Intra-vehicular networks have been known as an easy
target for attackers owing to their complexity and
Intra-Vehicular Network Security Datasets Evaluation
403
Table 1: Classification of Intra-vehicular Networks Communication Protocols.
Network Speed Bandwidth Topology Max Supported
Nodes
Advantages Limitations
CAN 25 Kbps – 1 Mbps Star, Ring, Linear
bus
30 High reliability, low
cost
Limited bandwidth,
vulnerable to attacks
LIN 25 Kbps – 1 Mbps Liner bus 16 Bus Low cost, low
power
Limited data rate and
distance
FlexRay Up to 10 Mbps Star, Linear bus,
hybrid
22 High reliability, high
bandwidth
Higher cost, limited
interoperability
Ethernet Up to 100 Mbps Star, Linear bus Depends on
Switch ports
High bandwidth,
scalable
Higher cost, high
power consumption
MOST Up to 150 Mbps Ring 64 High bandwidth, low
latency
Limited distance,
higher cost
open issues caused by the existing vulnerabilities.
These security issues are described as follows:
Lack of adequate bus protection, leaving mes-
sages vulnerable to interception, modification,
and fabrication, and lacking necessary protections
such as confidentiality, integrity, authenticity, and
non-repudiation.
Authentication issues, allowing unauthorized re-
programming of ECU firmware, posing safety
risks and enabling control over critical compo-
nents.
Protocol implementation issues, where deviations
from safety rules and guidelines compromise sys-
tem reliability and safety.
Data leakage issues, enabling unauthorized access
to private vehicle data, violating privacy and com-
promising security.
Misuse of protocols, leveraging mechanisms like
bus arbitration and fault detection to launch dis-
ruptive attacks on the network.
However, intra-vehicular network attacks might
sneak from different entry points (e.g., Sensors, Phys-
ical surfaces, and remote mediums). Moreover, auto-
motive protocols also present an easy target too. Fig-
ure 2 depicts some attacks and a graphical representa-
tion of the entry points.
3 INTRA-VEHICULAR
NETWORKS DATASETS
As vehicles become more connected and rely on
ECUs, the importance of intra-vehicular network se-
curity is becoming increasingly obvious. Indeed,
intra-vehicular networks rely on the protocols. Within
this context, this section examines the available intra-
vehicular network datasets. In addition, this section
presents the most important phase, which is the pre-
processing phase. Moreover, Table 2 evaluates all the
above datasets and analyzes their advantages and lim-
itations.
3.1 Existing Datasets
3.1.1 Car-Hacking Dataset
The car hacking dataset consists of CAN packets col-
lected from the OBD-II terminal. Each CAN packet is
defined by three important features: CAN ID, which
represents the CAN packet’s identifier, DATA[0] to
DATA[7], which defines the packet’s 8 bytes; and the
flag, which admits two possible values, T and R. (T:
inject packet and R: normal packet). Normal traffic
and three forms of attack are included in this dataset.
(1) DoS attack: CAN ID = 0X000 DoS packets are
injected every 0.3 milliseconds. (2) Flexible attack:
Every 0.5 milliseconds, random ID and DATA val-
ues are injected. (3) Spoofing Attack (RPM/gear): It
injects RPM and gear-related CAN ID packets every
millisecond.
3.1.2 OTIDS Dataset
OTDIS(Lee et al., 2017) represents the Offset Ra-
tio and Time Interval based Intrusion Detection Sys-
tem which is a novel IDS based on the timing of re-
mote frame responses. The basic strategy consisted
of transmitting remote frame requests for a given ID,
measuring how long it took an ECU to respond, and
determining whether this delay was unusual; the idea
was that a compromised ECU under the control of an
adversary would respond with an unusual delay. This
dataset is generated by collecting CAN packets via the
OBD-II port. It comprises both normal transmissions
and DoS attacks with a CAN ID of ”0X000. It also
includes fuzzy and impersonation attacks. The CSV
files associated with fuzzy and impersonation attacks
do not indicate whether a packet is normal.
SIMULTECH 2023 - 13th International Conference on Simulation and Modeling Methodologies, Technologies and Applications
404
Table 2: Intra-vehicular Communication Datasets: Comparison.
Dataset Ref/Year Objective Attacks Nature of
Data
Format Label Protocol
Car-
Hacking
(Seo et al., 2018) Intrusion Detection DoS, Fuzzy, Spoofing Real CSV Yes CAN protocol
OTIDS (Lee et al., 2017) Intrusion Detection DoS, Fuzzy, Impersonation Real CSV No CAN protocol
Survival (Han et al., 2018) Intrusion Detection Flooding, Fuzzy, Malfunction Real CSV Yes CAN protocol
SynCAN (Hanselmann et al.,
2020)
Intrusion Detection Suspension, Fabrication, Masquer-
ade
synthetic CSV No CAN protocol
TU/e v2 (Dupont et al., 2019) Intrusion Detection DoS, Fuzzy,Diagnostic, Replay,
Suspension
synthetic CSV No CAN protocol
ROAD (Verma et al., 2020) Intrusion Detection Masquerade, Fabrication targeted
ID, Accelerator
Real CSV Yes CAN protocol
CrySyS (Chiscop et al., 2021) Intrusion Detection Plateau attack, Continuous change
attack, Playback attack, Suppres-
sion attack, Flooding attack
Real data
and synthetic
attacks
CSV No CAN proto-
col, GPS
SIMPLE (Foruhandeh et al.,
2019)
Intrusion Detection Dominant Impersonation, Com-
plete Impersonation
Real NA Yes CAN protocol
Bi (Bi et al., 2022) Intrusion Detection Dos, Fuzzy, Ulterior Fuzzy, Replay Real NA No CAN protocol
3.1.3 Survival Dataset
HCRL released two datasets derived from three dis-
tinct vehicles: the ”Kia Soul,” ”Hyundai Sonata,” and
”Chevrolet Spark. One of the datasets contains nor-
mal driving records, while the other contains driving
records that are anomalous due to three attack sce-
narios: flooding, fuzzy, and malfunction. These at-
tacks consisted of implanting attack messages every
20 seconds for five seconds, capturing each threat for
25-100 seconds. The dataset was used to develop a
survival analysis-based detection model (Han et al.,
2018) capable of identifying anomalies in in-vehicle
networks. Survival analysis is a statistical technique
that focuses on the timing of an event.
3.1.4 SynCAN
The SynCAN Dataset (Hanselmann et al., 2020) is a
standard for comparing and contrasting various CAN
Intrusion Detection Systems (IDS) using multiple at-
tack scenarios in the signal space. It consists of a
training dataset and six testing datasets, each of which
contains columns for labels, IDs, time, and signal val-
ues. The following files contain the six datasets used
for testing: testnormal.zip contains only normal data
with a label of 0 for evaluating IDS performance on
unmodified data. Other files include test plateau.zip,
in which a signal’s value remains constant over time,
and testcontinuous.csv, in which a signal’s value pro-
gressively deviates from its actual value. The dataset
also includes test playback.zip, in which a signal is
overwritten with a recorded time series of the same
signal, testsuppress.zip, in which messages of a spe-
cific ID are absent from the CAN traffic due to an
attacker preventing an ECU from sending messages,
and test f looding.zip, in which an attacker sends mes-
sages of a specific existing ID at a high frequency to
the CAN bus. This dataset is intended to facilitate the
unsupervised training and evaluation of IDS on both
normal and aberrant data.
3.1.5 TU/e v2 Dataset
In their study, the authors in (Dupont et al., 2019) sug-
gested a framework for evaluating intrusion detection
systems (NIDSs) for Controller Area Network (CAN)
networks. They gathered data from two vehicles,
Opel Astra and Renault Clio, and a CAN bus pro-
totype that they constructed to generate their dataset.
Additionally, they utilized Kia Soul data from the car-
hacking dataset. The dataset is available online at the
Eindhoven University of Technology Lab (TUe Secu-
rity Group (Group, 2019)). The authors introduced a
sequence of attacks against the prototype to generate
attack datasets and then simulated these attacks on ve-
hicles. They randomly injected ten packets with CAN
IDs greater than 0x700 to perform a diagnostic attack.
Next, they carried out two fuzzing attacks, which in-
cluded injecting ten packets with unknown CAN IDs
and altering the payload of ten frames with a valid
CAN ID. They also performed a replay attack by in-
jecting an arbitrary packet that occurred 30 times in
the dataset and modifying the timestamp to send the
packets ten times quicker than usual. To simulate a
DoS attack, messages with a CAN ID of 0x000 were
sent at a rate of four packets per millisecond to replace
all messages within a 10-second period. Finally, the
authors simulated a suspension attack by deleting all
messages containing a specific CAN ID over a 10-
second period.
3.1.6 ROAD Dataset
The ROAD dataset (Verma et al., 2020) comprises 12
ambient captures with approximately 3 hours of ambi-
ent data and 33 attack captures with a total runtime of
around 30 minutes. All the data was collected from
Intra-Vehicular Network Security Datasets Evaluation
405
a single vehicle whose make and model are not dis-
closed. All the data was collected from a single vehi-
cle whose make and model are not disclosed. Three
categories are used to classify the attacks that were
recorded on the dataset. The first category is the
fussing attack in which the authors injected frames
with random IDs every 0.005s. The second category
consists of targeted ID fabrication & masquerade at-
tacks. The authors used the flam delivery technique
for targeted ID fabrication, in which a message is in-
jected immediately after a legitimate message con-
taining the target ID is seen. For the masquerading
attack version, the authors deleted the legitimate tar-
get ID frames preceding each injected frame to simu-
late a masquerade attack. Finally, accelerator attacks
are an additional category in which the attack uses
a vulnerability particular to the vehicle make/model,
compromising the ECUs.
3.1.7 CrySyS Dataset
The CrySyS Lab created the publically available
dataset (Chiscop et al., 2021) for the SECREDAS
project. It includes seven captures and one extended
driving scenario trace, along with 20 message IDs and
varying signal numbers. In addition to the dataset,
the authors created a signal extractor and attack gen-
erator script that can modify CAN messages in vari-
ous ways, including changing to constant or random
values, modifying with delta or increment/decrement
values, or switching to increment/decrement values.
In addition, the attack generator can be used to sim-
ulate attacks by substituting a selected signal in the
CrySyS traces with a constant value.
3.1.8 SIMPLE Dataset
The SIMPLE dataset (Foruhandeh et al., 2019) is a
collection of public data obtained by capturing CAN
messages from two vehicles, a 2016 Nissan Sentra
and a 2011 Subaru Outback, through the OBD-II
interface with a Tektronix DPO 3012 oscilloscope.
During each round, the vehicles were driven for ap-
proximately 40 minutes, including local and high-
way traffic. The dataset includes more than 16,000
frames. Each frame in this data set comprises six
parts: CAN high voltage samples, CAN low voltage
samples, time interval, sample rate, decoded bits, and
message ID. In this data set, Hill climbing-style at-
tacks are included.
3.1.9 Bi’s Dataset
The dataset proposed in (Bi et al., 2022) is generated
from various driving situations. It is used with CAN
traffic acquired from the test vehicle’s daily commute
route. The vehicle’s route included three different
scenarios: country roads, highways, and congested
city roads. The dataset had 29213281 messages and
contained seven days of CAN traffic gathered during
commuter driving. The dataset included challenging
road conditions like slippery, congested, rainy, and
foggy roads. The authors injected anomalous data
into the CAN bus of the test vehicles using data in-
jection equipment. They used four attack models in
the vehicle’s stationary state and driving state to gen-
erate the attack dataset. The attack messages included
DoS attacks, fuzzy attacks, ulterior fuzzy attacks, and
replay attacks.
3.2 Data Pre-Processing
The preprocessing phase is characterized by different
steps inside which differ from one dataset to another.
They highly depend on the dataset format, type, size,
and features, among others. On the other hand, this
phase shares many similarities applied to the dataset,
being a general structure without specificities. Before
discussing the characteristics of existing datasets, it
is essential to understand in-vehicle data clearly.The
CAN bus data is widely researched due to its primary
usage as a data source. The CAN frame structure con-
sists of seven fields: Start of the frame (SOF), Arbi-
tration Field (identifier and RTR), Control Field, Data
Field, CRC Field, Acknowledge Field, and End of
Frame. These fields serve various purposes such as
initiating transmission, prioritizing messages, verify-
ing successful reception, transmitting data, ensuring
message integrity, confirming successful receipt, and
signaling frame termination.
However, Intra-vehicular network datasets are
generated by simulating ECUs vehicles injecting
CAN messages in a controlled environment. There-
fore, data might be collected from a single vehicle
or multiple vehicles. Hence, the data preprocess-
ing phase comes directly after the data acquisition.
This major phase comprises different steps, such as
normalization, data cleaning, and feature encoding,
which may be common for many datasets. There are
other steps, such as feature selection, resizing, and
format conversion, may need to be customized based
on the unique characteristics of each dataset. Regard-
ing feature encoding, it is important to convert quali-
tative values, such as ”normal” or ”attack,” to integer
values. For binary classification, the values should
be altered to ”0” and ”1, while for multi-class clas-
sification, the values should range from ”1” to ”n,
where ”n” denotes the number of classes. Mean-
while, for CAN ID data, hexadecimal values should
SIMULTECH 2023 - 13th International Conference on Simulation and Modeling Methodologies, Technologies and Applications
406
Table 3: Inter-vehicular Communication Datasets: Recommendation.
Dataset DATA source Data type Best Attack Detected Worst Attack Detected Recommendation
Car-Hacking Multiple Vehicles Standard CAN data ID attacks DoS attack ⋆⋆
OTIDS Single vehicle Standard CAN data Fuzzy attack Masquerading attack ⋆⋆
Survival Multiple vehicles Standard CAN data Fuzzy attack Malfunction attack
SynCAN Single vehicle Signal Masquerading attack - ⋆⋆
TU/e v2 Multiple vehicles Standard CAN data Suspension, Masquerading DoS
ROAD Single vehicle Standard CAN data and Signal data masquerading ⋆⋆
CrySyS Single Vehicle Standard CAN data, GPS data Masquerading - ⋆⋆
SIMPLE Single vehicle Signal Complete impersonation Dominant impersonation
Bi Single vehicle Standard CAN data Dos, Fuzzy, Replay ⋆⋆
be converted to decimal values using specific func-
tions (such as the ”hex2dec” function). For the data
field, spaces between bytes should be removed us-
ing the ”gsub” function, and the hexadecimal data
value should then be converted to decimal integers
(the ”Rmpfr” function could be used as a function).
Finally, Some datasets may have different file formats
that need to be converted to a common format before
preprocessing can be performed.
4 DISCUSSION
Available intra-vehicular network security datasets
are very limited. Moreover, the existing datasets fo-
cus mainly only on CAN bus protocol and do not
give attention to the other protocols. The car hack-
ing dataset is the most used in the context of CAN
IDS literature. The attack recordings in this dataset
comprise a large number of instances per attack. The
ID attacks present the gear and RPM functions in
this dataset. However, the attack simulations are not
occurred when the car is driven, which makes the
test data different from the training data. In addi-
tion, data are in different formats, which is unde-
sirable. On the other hand, these available datasets
study redundant attacks (e.g., DoS attacks and fuzzy
attacks). The OTIDS is the only dataset that provides
a slightly stealthier version of spoofing IDS in nor-
mal traffic. Therefore, this dataset can be used for
identity spoofing-based systems. In addition, it is the
only dataset with remote frames and responses. How-
ever, this dataset is not recommended for many rea-
sons. First, the injection intervals need to be clarified
and explained in the documentation. Then, Although
the attack was labeled as a masquerade attack in the
paper, it may not meet the criteria of a true masquer-
ade attack since the legitimate node’s message trans-
mission was not suspended. Finally, remote frame
requests and responses result in minor timing varia-
tions, which may pose a challenge when testing and
training a timing-based detector. The Survival dataset
includes attacks on three vehicles that can have a real
effect on the vehicle. Therefore, this dataset is a good
choice for a simple timing-based detector. However,
similar to car hacking and OTIDS datasets, the at-
tacks are basic and simple to detect. Furthermore, the
amount of data offered for each vehicle in ambient
captures is only 60-90 seconds, which is inadequate
to ensure reliable training and to examine false posi-
tive rates. SynCAN is one of the most known datasets
that is based on the signal. It is quite similar to ROAD
dataset and SIMPLE dataset. This dataset compro-
mises attacks that target a single signal and the full 64-
bit data field, which allows for testing very advanced
IDS-based signals. However, this dataset can not be
used by the IDS that use the CAN data in the standard
format (IDs with data fields).
Researchers that would simulate diagnostic proto-
col attacks could use TU/e v2 dataset. This dataset
is the only dataset that includes suspension attacks
in standard CAN data. However, this dataset is not
used to simulate the DoS attacks. In addition, the
data generation process needs to be clarified for this
dataset. Finally, accessing information about the in-
jected packets and their timing is complicated because
the attack labels are stored in an unstructured text file.
The ROAD dataset is one of the recent datasets
that treat the limitation of the previous datasets. It
provides different types of fuzzy attacks. Further-
more, it is the only dataset in which the attacks are
physically verified. In addition, this dataset provided
both CAN data and CAN signal. However, the mas-
querading attacks rely on a small amount of simu-
lation. In addition, this dataset could not provide a
high resolution for testing time-based detectors be-
cause the time stamps are accurate only to 100us.
Crysys dataset is a good dataset to simulate and
detect masquerading attacks. In addition, this dataset
gives a clear idea about the injection time. Crysys is
the only dataset that describes the driver’s actions dur-
ing data capturing. The only limitation of this dataset
is that the attacks are added after the post-processing,
which can affect vehicle functions. Finally, there are
two datasets, namely SIMPLE and Bi’s, which are
private. The access is not available for public users,
and they need the permission of the creator to use
them. These aforementioned datasets are analyzed
and compared based on different metrics such as their
source, type, best and worst detected attacks using
Intra-Vehicular Network Security Datasets Evaluation
407
this dataset, respectively, and usability recommenda-
tion (See Table 3).
5 CONCLUSION
Both sectors, including research and the industry,
have shown incredible concerns about vehicular net-
work security. Therefore, intra-vehicular network se-
curity needs to be addressed as well. In accordance
with the current solutions, studying intra -vehicular
security datasets will provide a strong base for the re-
search and development to acquire valuable enhanced
solutions. This paper is devoted to presenting a com-
prehensive study of various intra-vehicular network
security datasets and their related quality measures.
In addition, this study addresses the major phase of
datasets, which is preprocessing. Moreover, it exam-
ines the available existing datasets and presents their
impact through comparative analyses that show their
benefits and limitations.
REFERENCES
Aksu, D. and Aydin, M. A. (2022). Mga-ids: Optimal fea-
ture subset selection for anomaly detection framework
on in-vehicle networks-can bus based on genetic algo-
rithm and intrusion detection approach. Computers &
Security, 118:102717.
Al-Jarrah, O. Y., Maple, C., Dianati, M., Oxtoby, D., and
Mouzakitis, A. (2019). Intrusion detection systems
for intra-vehicle networks: A review. IEEE Access,
7:21266–21289.
Bi, Z., Xu, G., Xu, G., Tian, M., Jiang, R., and Zhang, S.
(2022). Intrusion detection method for in-vehicle can
bus based on message and time transfer matrix. Secu-
rity and Communication Networks, 2022.
Chiscop, I., Gazdag, A., Bosman, J., and Bicz
´
ok, G. (2021).
Detecting message modification attacks on the can bus
with temporal convolutional networks. arXiv preprint
arXiv:2106.08692.
Duo, W., Zhou, M., and Abusorrah, A. (2022). A survey
of cyber attacks on cyber physical systems: Recent
advances and challenges. IEEE/CAA Journal of Auto-
matica Sinica, 9(5):784–800.
Dupont, G., Den Hartog, J., Etalle, S., and Lekidis, A.
(2019). Evaluation framework for network intrusion
detection systems for in-vehicle can. In 2019 IEEE
International Conference on Connected Vehicles and
Expo (ICCVE), pages 1–6. IEEE.
Foruhandeh, M., Man, Y., Gerdes, R., Li, M., and Chantem,
T. (2019). Simple: Single-frame based physical
layer identification for intrusion detection and preven-
tion on in-vehicle networks. In Proceedings of the
35th annual computer security applications confer-
ence, pages 229–244.
Group, T. S. (2019). Eindhoven university of technology.
Haddaji, A., Ayed, S., and Fourati, L. C. (2022). Artifi-
cial intelligence techniques to mitigate cyber-attacks
within vehicular networks: Survey. Computers and
Electrical Engineering, 104:108460.
Han, M. L., Kwak, B. I., and Kim, H. K. (2018). Anomaly
intrusion detection method for vehicular networks
based on survival analysis. Vehicular communica-
tions, 14:52–63.
Hanselmann, M., Strauss, T., Dormann, K., and Ulmer, H.
(2020). Canet: An unsupervised intrusion detection
system for high dimensional can bus data. Ieee Access,
8:58194–58205.
Jichici, C., Groza, B., Ragobete, R., Murvay, P.-S., and An-
dreica, T. (2022). Effective intrusion detection and
prevention for the commercial vehicle sae j1939 can
bus. IEEE Transactions on Intelligent Transportation
Systems, 23(10):17425–17439.
Lee, H., Jeong, S. H., and Kim, H. K. (2017). Otids: A
novel intrusion detection system for in-vehicle net-
work by using remote frame. In 2017 15th An-
nual Conference on Privacy, Security and Trust (PST),
pages 57–5709. IEEE.
Rajapaksha, S., Kalutarage, H., Al-Kadri, M. O., Petrovski,
A., Madzudzo, G., and Cheah, M. (2023). Ai-based
intrusion detection systems for in-vehicle networks: A
survey. ACM Computing Surveys, 55(11):1–40.
Rathore, R. S., Hewage, C., Kaiwartya, O., and Lloret,
J. (2022). In-vehicle communication cyber security:
challenges and solutions. Sensors, 22(17):6679.
Seo, E., Song, H. M., and Kim, H. K. (2018). Gids: Gan
based intrusion detection system for in-vehicle net-
work. In 2018 16th Annual Conference on Privacy,
Security and Trust (PST), pages 1–6. IEEE.
Shah, Z., Ullah, I., Li, H., Levula, A., and Khurshid, K.
(2022). Blockchain based solutions to mitigate dis-
tributed denial of service (ddos) attacks in the internet
of things (iot): A survey. Sensors, 22(3):1094.
Shokravi, H., Shokravi, H., Bakhary, N., Heidarrezaei,
M., Rahimian Koloor, S. S., and Petr
˚
u, M. (2020).
A review on vehicle classification and potential
use of smart vehicle-assisted techniques. Sensors,
20(11):3274.
Verma, M. E., Iannacone, M. D., Bridges, R. A., Hollifield,
S. C., Kay, B., and Combs, F. L. (2020). Road: The
real ornl automotive dynamometer controller area net-
work intrusion detection dataset (with a comprehen-
sive can ids dataset survey & guide). arXiv preprint
arXiv:2012.14600.
SIMULTECH 2023 - 13th International Conference on Simulation and Modeling Methodologies, Technologies and Applications
408