Improving Intrusion Detection Systems with Multi-Agent Deep

Reinforcement Learning: Enhanced Centralized and Decentralized

Approaches

Amani Bacha

, Farah Barika Ktata

and Faten Louati

MIRACL Laboratory, ISSATSo, Sousse University, Sousse, Tunisia

MIRACL Laboratory, FSEGS, Sfax University, Sfax, Tunisia

Keywords:

Multi-Agent Deep Reinforcement Learning (MADRL), Intrusion Detection System (IDS), Deep Q-Network

(DQN), NSL-KDD, MADQN, COCA-MADQN, MADQN-GTN.

Abstract:

Intrusion detection is a crucial task in the ﬁeld of computer security as it helps protect these systems against

malicious attacks. New techniques have been developed to cope with the increasing complexity of computer

systems and the constantly evolving threats. Multi-agent reinforcement learning (MARL), is an extension of

Reinforcement Learning (RL) in which agents can learn to detect and respond to intrusions while considering

the actions and decisions of the other agents. In this study, we evaluate MARL’s performance in detecting

network intrusions using the NSL-KDD dataset. We propose two approaches, centralized and decentralized,

namely COCA-MADQN and MADQN-GTN. Our approaches show good results in terms of Accuracy, Preci-

sion, Recall, and F1-score.

1 INTRODUCTION

Machine learning (ML) offers several methods for in-

trusion detection, such as supervised, unsupervised,

semi-supervised, and Reinforcement learning (RL)

algorithms (Molina-Coronado et al., 2020). RL was

ﬁrst formalized by (Sutton et al., 1998) and subse-

quently extended to MARL. MADRL is an extension

of Deep Reinforcement Learning (DRL) that allows

multiple agents to interact and learn from each other

in a shared environment. Hence, the MADRL-IDS

approach possesses the capability to acquire knowl-

edge from past experiences and assimilate novel at-

tack patterns, thereby enhancing its efﬁcacy in de-

tecting and thwarting cyber intrusions (Sethi et al.,

2021). However, MARL approaches have limitations

including scalability, non-stationarity, partial observ-

ability issues. Non-stationarity is a major challenge

in distributed MARL as it can lead to degraded per-

formance or even failure of the learning algorithm.

The latter is caused by the changes in an agent’s pol-

icy during learning and the delay incurred in informa-

tion exchange between agents (Ibrahim et al., 2021).

As for scalability, which is a hurdle in centralized

MARL caused by the huge number of agents or the

complexity of the state and action spaces, the com-

putational complexity of the centralized approach can

become prohibitive, making it difﬁcult to scale up to

larger problems. Moreover, the centralized approach

requires the sharing of all observations with the cen-

tral agent. Communication overhead can also become

a bottleneck in large-scale MARL problems, and the

central agent must process and integrate all the ob-

servations from each agent to make decisions. This

causes a high communication bandwidth (Zhu et al.,

2022). We tackled the mentioned problems by em-

ploying two MADQN- IDS-based approaches: cen-

tralized and decentralized, namely COCA-MADQN

and MADQN-GTN. We evaluated their effectiveness

using the NSL-KDD dataset.

COCA-MADQN (Common observation Common

action-MADQN): This centralized approach elimi-

nates the need for a single central agent to coordi-

nate all actions. All agents have equal access to ob-

servations and can communicate with each other to

determine the best course of action. The decision-

making process is shared among the agents, and the

ﬁnal action is determined by majority voting. The

shared ReplayBuffer further allows agents to learn

from each other’s experiences, leading to better co-

ordination and overall performance improvement in

cooperative MARL problems.

772

Bacha, A., Ktata, F. and Louati, F.

Improving Intrusion Detection Systems with Multi-Agent Deep Reinforcement Learning: Enhanced Centralized and Decentralized Approaches.

DOI: 10.5220/0012124600003555

In Proceedings of the 20th International Conference on Security and Cryptography (SECRYPT 2023), pages 772-777

ISBN: 978-989-758-666-8; ISSN: 2184-7711

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

MADQN-GTN (MADQN with Global Target Net-

work): Our decentralized approach is inspired by the

concept of Vertical Federated Reinforcement Learn-

ing (VFRL)(Qi et al., 2021). MADQN-GTN utilizes a

global target network (GTN) instead of individual lo-

cal target networks for each DQN agent. This proved

to be more effective since the weights of the GTN are

averaged across all agents, resulting in a more consis-

tent and stable learning process. This approach leads

to faster convergence and better performance in coop-

erative MARL problems.

The remainder of the paper is organized as follows:

Section 2 discusses the essential concepts and back-

ground. Section 3 provides an overview of the re-

lated work on RL and MARL for intrusion detection.

Section 4 describes the proposed approaches in de-

tail. Section 5 presents the experimental results and

analysis. Finally, Section 6 concludes the paper and

outlines possible directions for future research.

2 BACKGROUND

2.1 Markov Decision Process (MDP)

and Markov Game (MG)

MDP and Markov Game are both important concepts

in the ﬁeld of RL and have numerous applications in

real-world problems including cybersecurity (Nguyen

et al., 2020). According to (Canese et al., 2021), MDP

is a discrete-time stochastic control process involv-

ing a single agent. It consists of a set of states, ac-

tions, transition probabilities between states, and re-

wards associated with each state-action space. MG

extends the concept of MDP to include multiple inter-

acting agents and the environment. However, agents

collaborate or compete to achieve shared or conﬂict-

ing objectives.

2.2 Reinforcement Learning (RL)

RL algorithms can be broadly categorized into value-

based algorithms and policy-based algorithms. Value-

based methods learn the value function to indirectly

determine the optimal policies (Mnih et al., 2015).

Algorithms like Q-Learning, and DQN are well-

known examples of value-based methods. Policy-

based methods directly optimize the policies them-

selves without the need for a separate value function.

Algorithms like REINFORCE and Proximal Policy

Optimization fall into this category. In addition,

value-based approaches excel in off-policy learning

and discrete action spaces, while policy-based ap-

proaches can handle both discrete and continuous

control and often offer strong performance guaran-

tees (Lee et al., 2022). Actor-Critic (AC) Methods

combine value-based and policy-based approaches.

Notably, the widely used Deep Deterministic Policy

Gradient (DDPG) algorithm employs the Actor-Critic

approach for continuous action-space tasks (Canese

et al., 2021).

3 RELATED WORK

(Lopez-Martin et al., 2022) suggested several exten-

sions and improvements to DQN including double

DQN (DDQN) in Adversarial RL for an intrusion de-

tection ﬁeld. Likewise, a study in (Nguyen and Reddi,

2021) applied DRL methods such as DQN, Double

DQN, and actor-critic models for network intrusion

detection. It showed that the DRL policy networks

are efﬁcient and responsive, making them suitable for

online learning and quick adaptation in dynamic data

networks. As an extension of DRL, MADRL has be-

come increasingly popular in recent years as it can

solve complex real-world problems that traditional

RL struggled with (Ibrahim et al., 2021). Many sur-

veys, examined MADRL from different perspectives

due to its rising popularity. From a mathematical per-

spective, some literature provided theoretical analyses

of MADRL, namely (Zhang et al., 2021), (Nguyen

et al., 2020). MADRL ﬁnds applications in various

domains such as the medical ﬁeld. Researchers in

(Vlontzos et al., 2019) has introduced an innovative

approach that utilizes cooperative MADRL for detect-

ing multiple landmarks in medical images. However,

few studies explored the use of MADRL in the con-

text of intrusion detection systems. In reference to the

collaborative MARL topic, we consider (Servin and

Kudenko, 2008) paper as a valuable one. It proposes a

distributed model where network sensor agents learn

to communicate signals within a hierarchical struc-

ture. Higher-level agents in the hierarchy interpret

local information from these signals and notify the

network operator of abnormal states when required.

This approach addresses the issue of scalability. (Zhu

et al., 2014) put forth a new approach for adaptive

IDS logic using iterative reinforcement learning and

Multi-armed Bandits (MAB). (Caminero et al., 2019)

and (Suwannalai and Polprasert, 2020) implemented

Adversarial Reinforcement Learning with Deep Q-

network (AR-DQN) which is a technique that com-

bines adversarial learning and DRL to improve the

robustness of an RL agent against adversarial at-

tacks. The authors (Sethi et al., 2021) proposed a

DRL-based IDS that utilized DQN logic across var-

ious distributed network nodes, providing a multi-

Improving Intrusion Detection Systems with Multi-Agent Deep Reinforcement Learning: Enhanced Centralized and Decentralized

Approaches

773

view representation, then they designed and imple-

mented a MADQN equipped with an attention mech-

anism. In this paper, we carefully designed and im-

plemented two MADQN-IDS-based centralized and

decentralized approaches using NSL-KDD. Our ap-

proaches demonstrated their effectiveness in achiev-

ing high-performance metrics such as accuracy, pre-

cision, recall, and F1-score, and they overcome some

of the common challenges associated with IDS, such

as dealing with large volumes of network data and de-

tecting a wide range of attack types. On the whole,

our approaches represent a promising solution for en-

hancing the security of networks and protecting them

against malicious attacks.

4 PROPOSED APPROACH

Since we work in a discrete action-space, we have

opted for multi-agent Deep Q-Network (MADQN)

method. In the context of IDS, MADQN can be used

to train multiple DQN-agents to detect and respond to

different types of attacks to improve the overall secu-

rity of the system. By implementing COCA-MADQN

and MADQN-GTN, we can explore their trade-offs

and beneﬁts in IDS.

4.1 COCA-MADQN

This approach eliminates the need for a central agent

and instead, it distributes the decision-making pro-

cess among the agents. All agents have equal ac-

cess to observations and can communicate with each

other to determine the best course of action. The ﬁ-

nal action is determined by majority voting. Indeed,

during each episode, agents interact with the envi-

ronment and collect transitions (state, action, reward,

next state). These transitions are stored in a shared

replay buffer. A batch of transitions (b

) is randomly

sampled from the replay buffer and used to update the

agents’ neural network parameters (Figure 1).

The use of a shared replay buffer reduces the commu-

nication costs and the loss function. For each agent,

the target network is updated every episode to sta-

bilize action policy formation and provide an esti-

mation of the expected future value for each action.

This estimation is used to update the Q-values asso-

ciated with actions in the prediction network. The

main neural network parameters are optimized using

a cost function that measures the difference between

the main neural network’s predictions and the target

network’s produced values. The use of a target neu-

ral network reduces ﬂuctuations in the main neural

network’s predictions and improves learning stability

Figure 1: COCA-MADQN: Centralized Approach Model.

and performance (Nguyen et al., 2022). The ε-greedy

policy is used to select actions for the agents based

on the main neural network’s prediction. This means

that with probability ε, a random action is selected to

encourage the exploration of the action space, while

with a probability of 1-ε, the optimal action is cho-

sen based on the main neural network’s prediction.

Our approach, COCA-MADQN, involved choosing a

ﬁnal action through a majority vote process by select-

ing the most commonly predicted action by all agents

and then assigning an average of all rewards. After

that, the state would move to the next state and the

process would be repeated until the training was com-

pleted. We present a detailed algorithm (Algorithm 1)

that outlines the key steps of our approach.

Algorithm 1: COCA-MADQN Algorithm.

Initialize:

Replay buffer D to capacity N

Step counter T = 0

Action-Value Q with random weight θ

Target Action-Value Q

′

with weight θ

−

← θ

repeat

for each agent i do

Receive state s

, get Q(s

, a, θ)

Choose action a

according to ε-greedy policy

Choose ﬁnal action a by majority voting among all agents

Execute a, get s

′

and reward r

end

for each agent i do

Receive new state s

′

Store (s, a

ﬁnal

, r) in D

Randomly sample minibatch transitions from D

Y =

(

+ γ max

′

, a

′

, θ

−

) if s

′

is not terminal

if s

′

is terminal

Update parameters θ using minibatch gradient descent on

(Y − Q(s

, a

, θ))

← s

′

end

T ← T + 1

if T mod max timestep == 0 then

Update target network weight θ

−

← θ

end

until T > T

max

;

4.2 MADQN-GTN

First, we started to implement MADQN with a tar-

get network for each agent, but the results didn’t

SECRYPT 2023 - 20th International Conference on Security and Cryptography

774

lead to satisfactory convergence because of the non-

stationarity. That’s why, we proposed a new approach

MADQN-GTN to overcome this limit. To further

develop our approach, we used a MADQN with a

Global Target Network. This means that we will use

a shared target network for all agents in our system,

which will help stabilize the training process and im-

prove the overall performance of our model. This ap-

proach was inﬂuenced by the VFRL (Vertical Feder-

ated Reinforcement Learning) described in (Qi et al.,

2021), where each agent updates its model based on

local observations and periodically sends the aggre-

gated local models to the central server. The up-

dated global model is generated by the central server

and distributed to the agents after being combined

with the local models. Back to our approach, as ex-

plained above, we opted for a shared GTN across all

agents to effectively transfer knowledge and experi-

ence gained by one agent to other agents in the sys-

tem, leading to more efﬁcient and effective learning.

Our approach will involve training multiple DQN-

agents, each agent has its own set of inputs and out-

puts. The MADQN will learn to predict the Q-values

for each agent’s actions based on the current state of

the environment. The agents will then use these Q-

values to select actions that maximize their expected

rewards (Figure 2).

Figure 2: MADQN-GTN: Decentralized Approach Model.

Overall, the use of a MADQN with a Global

Target Network is a powerful and ﬂexible approach

for learning in multi-agent systems to achieve opti-

mal performance in our particular problem domain.

Global Target Network is regularly updated based on

the weights of individual agents. In fact, in our im-

plementation, the weights of the Global Target Net-

work are obtained by averaging the weights of all lo-

cal agents. Our approach is outlined in Algorithm 2,

which provides a detailed breakdown of the crucial

steps involved. Finally, we created a custom IDS en-

vironment with OpenAIGym, a library that enables

the creation of simulation environments for reinforce-

ment learning. We opted for a preprocessed and over-

sampled NSL-KDD which is a benchmark dataset that

Algorithm 2: MADQN-GTN Algorithm.

for each agent i ∈ {1, . . . , n} do

Initialize:

Replay buffer D

with capacity N

Action-Value Q

with random weights θ

Global Action-Value

with weights θ

−

= θ

end

Initialize step counter T ← 0

while T < T

max

for each agent i ∈ {1, . . . , n} do

Receive observation o

Choose action a

according to ε-greedy policy based on Q

Execute action a

Receive next observation o

′

Receive reward r

Store transition (o

, a

, r

, o

′

) in D

end

for each agent i ∈ {1, . . . , n} do

Sample minibatch of transitions (o

, a

, r

, o

′

) from D

(

+ γ max

′

, a

′

, θ

−

) if o

′

is not terminal

if o

′

is terminal

Update parameters θ

← o

′

end

T ← T + 1

if T mod max timestep = 0 then

Update Global Target Network parameters: θ

−

← Average(θ

)

end

enables the evaluation and comparison of IDS (allows

researchers to obtain consistent results.)(Ahsan et al.,

2023), to be deﬁned as an IDS environment. Agents’

observations are the unlabeled rows of the dataset

(Network trafﬁc samples), and actions correspond to

the categories associated with the labels (indicating

the class of attack.). In multi-class classiﬁcation, the

objective is to train agents to classify inputs into ﬁve

different categories (actions emitted as 0, 1, 2, 3, or 4

depending on the type of intrusion) based on their ob-

servations, where 0: Dos, 1: Probe, 2: R2L, 3: U2R

and 4: Normal. In the IDS environment, the agent is

rewarded for correctly classifying each input. In our

case, the reward is 1 if the label corresponding to the

current observation matches the action taken by the

agent, and 0 otherwise. When all inputs have been

processed, the environment is reset to a random state

to restart the classiﬁcation process.

5 EXPERIMENTAL RESULTS

AND ANALYSIS

To evaluate the performance of our proposed

MADQN approaches for network intrusion detection,

we utilized multiple metrics such as Accuracy, Pre-

cision, Recall, and F1-score. It is worth noting that

solely relying on Accuracy values to evaluate the

model’s performance may not provide a complete as-

sessment. To conduct a more thorough evaluation, we

analyzed the results based on the other performance

metrics. We implemented our approaches COCA-

MADQN and MADQN-GTN with NSL-KDD with

Improving Intrusion Detection Systems with Multi-Agent Deep Reinforcement Learning: Enhanced Centralized and Decentralized

Approaches

775

binary classiﬁcation and multi-class classiﬁcation and

it showed the high-performance metrics below with 3

DQN-agents (Table 1) and (Table 2).

Table 1: Performance metrics’ of COCA-MADQN.

Metrics for 3 agents Multi-class Binary

Accuracy 0.778 0.985

Precision 0.803 1.0

Recall 0.958 0.985

F1-score 0.874 0.992

Table 2: Performance metrics’ of MADQN-GTN.

Metrics for 3 agents Multi-class Binary

Accuracy 0.767 0.976

Precision 0.799 0.975

Recall 0.945 0.973

F1-score 0.866 0.974

The results conﬁrm the effectiveness and robust-

ness of these two approaches in intrusion detection.

In addition, the plotted curves (Figure 3, Figure 4 )

shows the convergence of the sum of rewards during

the episodes for both test and training. As the num-

ber of episodes increases from 0 to 100, the sum of

rewards for both the test and training moves towards

a common point, indicating that the model is learning

and becoming more effective in intrusion detection.

Figure 3: COCA-MADQN convergence curve.

Figure 4: MADQN-GTN convergence curve.

This decentralized approach has overcome the

limitations of a local target network for each agent,

which often results in poor learning performance

and failure of the learning curve to converge during

the episodes for both test and training for all agents

(Figure 5), primarily due to non-stationarity issues.

Figure 5: MADQN with Local Target Network convergence

curve.

Compared to the state-of-the-art works (Table 3),

our two approaches show signiﬁcant improvement

in performance on the NSL-KDD dataset in terms

of Accuracy, precision, recall, and F1-score in bi-

nary classiﬁcation as well as multi-class classiﬁca-

tion. These approaches were able to converge in

centralized MADRL. On the one hand, the proposed

COCA-MADQN approach was able to converge and

eliminate the need for a central agent to coordinate

all actions. By providing all agents equal access to

observations and allowing them to communicate with

each other, decisions were distributed and determined

by majority voting. Furthermore, the shared Replay-

Buffer allowed agents to learn from each other’s expe-

riences, leading to better coordination, scalability, and

improved performance in cooperative MARL prob-

lems. On the other hand, the proposed MADQN-

GTN approach, inspired by Vertical Federated Rein-

forcement Learning (VFRL), indicated that the use of

GTN leads to a more consistent and stable learning

process in cooperative MARL problems. Besides, we

have overcome the challenges of MARL, such as non-

stationarity and conﬁdentiality .

6 CONCLUSION AND

PERSPECTIVES

Our experimental results demonstrated that the pro-

posed centralized and decentralised approaches are

suitable for intrusion detection systems, hence,

their capability to detect network intrusion attacks

with high accuracy, precision, recall, and F1-score.

Our presented approaches: COCA-MADQN and

MADQN-GTN, demonstrated robustness in address-

ing the challenges of non-stationarity and scalabil-

ity. However, other factors such as computational

complexity should also be considered when deciding

which approach to use. In our future work, we in-

tend to implement our proposed methods in a real-

world cloud-based environment. This will enable our

DQN-based multi-agent to enhance self-learning abil-

ities and accurately detect threats in real-time.

SECRYPT 2023 - 20th International Conference on Security and Cryptography

776

Table 3: Comparison of performance metrics’ for intrusion detection with NSL-KDD dataset.

Reference Approach Accuracy Precision Recall F1-score

(Caminero et al., 2019) AE-RL 0.8016 0.7974 0.8000 0.7940

(Suwannalai and Polprasert, 2020) AE-RL 0.8000 X X 0.7900

(Sethi et al., 2021) A-DQN 0.9720 0.9650 0.9910 0.9780

Our COCA-MADQN MADQN 0.9850 1.000 0.9850 0.9920

Our MADQN-GTN MADQN 0.9760 0.9750 0.9730 0.9740

REFERENCES

Ahsan, M. M., Raman, S., and Siddique, Z. (2023). Bsgan:

A novel oversampling technique for imbalanced pat-

tern recognitions. arXiv preprint arXiv:2305.09777.

Caminero, G., Lopez-Martin, M., and Carro, B. (2019).

Adversarial environment reinforcement learning algo-

rithm for intrusion detection. Computer Networks,

159:96–109.

Canese, L., Cardarilli, G. C., Di Nunzio, L., Fazzolari, R.,

Giardino, D., Re, M., and Span

o, S. (2021). Multi-

agent reinforcement learning: A review of challenges

and applications. Applied Sciences, 11(11):4948.

Ibrahim, A. M., Yau, K.-L. A., Chong, Y.-W., and Wu, C.

(2021). Applications of multi-agent deep reinforce-

ment learning: Models and algorithms. Applied Sci-

ences, 11(22):10870.

Lee, H., Hong, J., and Jeong, J. (2022). Marl-based dual re-

ward model on segmented actions for multiple mobile

robots in automated warehouse environment. Applied

Sciences, 12(9):4703.

Lopez-Martin, M., Sanchez-Esguevillas, A., Arribas, J. I.,

and Carro, B. (2022). Supervised contrastive learn-

ing over prototype-label embeddings for network in-

trusion detection. Information Fusion, 79:200–228.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve-

ness, J., Bellemare, M. G., Graves, A., Riedmiller, M.,

Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-

level control through deep reinforcement learning. na-

ture, 518(7540):529–533.

Molina-Coronado, B., Mori, U., Mendiburu, A., and

Miguel-Alonso, J. (2020). Survey of network intru-

sion detection methods from the perspective of the

knowledge discovery in databases process. IEEE

Transactions on Network and Service Management,

17(4):2451–2479.

Nguyen, N. D., Nguyen, T. T., Pham, N. T., Nguyen, H.,

Nguyen, D. T., Nguyen, T. D., Lim, C. P., Johnstone,

M., Bhatti, A., Creighton, D., et al. (2022). To-

wards designing a generic and comprehensive deep

reinforcement learning framework. Applied Intelli-

gence, pages 1–22.

Nguyen, T. T., Nguyen, N. D., and Nahavandi, S. (2020).

Deep reinforcement learning for multiagent systems:

A review of challenges, solutions, and applications.

IEEE transactions on cybernetics, 50(9):3826–3839.

Nguyen, T. T. and Reddi, V. J. (2021). Deep reinforce-

ment learning for cyber security. IEEE Transactions

on Neural Networks and Learning Systems.

Qi, J., Zhou, Q., Lei, L., and Zheng, K. (2021). Federated

reinforcement learning: Techniques, applications, and

open challenges. arXiv preprint arXiv:2108.11887.

Servin, A. and Kudenko, D. (2008). Multi-agent rein-

forcement learning for intrusion detection. In Adap-

tive Agents and Multi-Agent Systems III. Adaptation

and Multi-Agent Learning: 5th, 6th, and 7th Euro-

pean Symposium, ALAMAS 2005-2007 on Adaptive

and Learning Agents and Multi-Agent Systems, Re-

vised Selected Papers, pages 211–223. Springer.

Sethi, K., Madhav, Y. V., Kumar, R., and Bera, P. (2021).

Attention based multi-agent intrusion detection sys-

tems using reinforcement learning. Journal of Infor-

mation Security and Applications, 61:102923.

Sutton, R. S., Barto, A. G., et al. (1998). Introduction to

reinforcement learning, volume 135. MIT press Cam-

bridge.

Suwannalai, E. and Polprasert, C. (2020). Network in-

trusion detection systems using adversarial reinforce-

ment learning with deep q-network. In 2020 18th In-

ternational Conference on ICT and Knowledge Engi-

neering (ICT&KE), pages 1–7. IEEE.

Vlontzos, A., Alansary, A., Kamnitsas, K., Rueckert, D.,

and Kainz, B. (2019). Multiple landmark detec-

tion using multi-agent reinforcement learning. In

Medical Image Computing and Computer Assisted

Intervention–MICCAI 2019: 22nd International Con-

ference, Shenzhen, China, October 13–17, 2019, Pro-

ceedings, Part IV 22, pages 262–270. Springer.

Zhang, K., Yang, Z., and Bas¸ar, T. (2021). Multi-agent rein-

forcement learning: A selective overview of theories

and algorithms. Handbook of reinforcement learning

and control, pages 321–384.

Zhu, C., Dastani, M., and Wang, S. (2022). A survey of

multi-agent reinforcement learning with communica-

tion. arXiv preprint arXiv:2203.08975.

Zhu, M., Hu, Z., and Liu, P. (2014). Reinforcement learning

algorithms for adaptive cyber defense against heart-

bleed. In Proceedings of the ﬁrst ACM workshop on

moving target defense, pages 51–58.

Improving Intrusion Detection Systems with Multi-Agent Deep Reinforcement Learning: Enhanced Centralized and Decentralized

Approaches

777