An MDP-Based Time Domain Impulse Jamming Mitigation Scheme

Yutao Wang

, Yingtao Niu

, Jianzhong Chen

and Fang Fang

Graduate School, Army Engineering University, Houbiaoying Road, Nanjing, P.R.China

The Sixty-third Research Institute, National University of Defence Technology, Nanjing, P.R.China

Keywords: Markov decision process, Anti-jamming, Impulse jamming.

Abstract: This paper proposed a method through a time domain Markov decision process as a countermeasure of

random periodic impulse jamming for a user in a time slotted environment. First, the random periodic

impulse jamming is modelled. Then the time domain MDP-based anti-jamming communication model is

proposed and the optimal transition probability on each state is calculated. Finally, we proposed an online

learning algorithm to approach the optimal transition probabilities. Simulation results show that our method

is better than other countermeasures of impulse jamming.

1 INTRODUCTION

Impulse jamming can corrupt the data transmission

of communication system(Poisel, 2011) in various

applications like IoT systems(Landa et al., 2017),

OFDM systems(Epple and Schnell, 2017) et al. A

short form periodic jamming (SFPJ) attack can cause

huge reduction of packet delivery ratio (PDR) with

little cost and traditional anti-jamming schemes such

as spread spectrum techniques in frequency domain

is not appropriate to the situation due to the impulse

signal has a wide spectrum density(Debruhl and

Tague, 2013). One usual pulse jamming pattern is

called periodic impulse jamming, which generates

impulse jamming periodically. The jamming source

is widely distributed in practice, such as high-

voltage equipment(Lin et al., 2015). Despite the

interferences generated by nature, impulse jamming

is also commonly used by malicious users to corrupt

communication links. Jie et al. derived a closed form

of BER (Bit Error Rate) of optimal periodic impulse

jamming for QPSK system(Jie et al., 2017). As a

countermeasure, the detection of periodic impulse

jamming is studied. Yuan Yuan He, et al.(He et al.,

2008) used wavelet transforming method to estimate

impulse jamming.

Instead of periodic impulse jamming, malicious

user can use variants of periodic impulse jammings

to improve jamming effect and to avoid being

detected. Random periodic impulse jamming is a

kind of impulse jamming whose occurrence time

obeys some distribution. We proposed a Markov

decision process (MDP) based countermeasures to

mitigate the jam effects.

This paper is organized as follow. The system

model is introduced in Section 2. In Section 3, we

calculated the optimal transmission probability

under certain jamming probability. An online

learning algorithm is provided in section 4 to obtain

the optimal transition probability vector. Section 5

presents computer simulation results. Finally, in

Section 6, some concluding remarks are provided.

2 SYSTEM MODEL

Consider the situation where a synchronized time-

slotted communication system consists 2 licensed

users, one of which sender, the other receiver. In

each time slot, the sender sends a frame with length

. A malicious jamming generates impulse

jamming sequentially with some distributions based

on certain period

T T t

. We consider each pulse

duration is too short than a frame length, but can

corrupt the frame. In this paper, We assume that the

launched time of

’s jamming impulse is

independent normally distributed with a mean of

, 1,2,3,...kT k 

, variance



Wang, Y., Niu, Y., Chen, J. and Fang, F.

An MDP-Based Time Domain Impulse Jamming Mitigation Scheme.

DOI: 10.5220/0008099500490054

In Proceedings of the International Conference on Advances in Computer Technology, Information Science and Communications (CTISC 2019), pages 49-54

ISBN: 978-989-758-357-5





Jamming

Probability

Density

Figure 1: Jamming probability density.

The pdf of the arrival interval is the convolution

of the pdf of 2 adjacent jammings, denoted as

()fx

Apparently

()fx

is normally distributed with mean

and variance



. If current time slot is

t nT

For convenience, we denote the conditional

probability of jamming next time slot is

max

()

( 1)

()

t t t t t









  







   





















(1)

Where

()

Q x e dt











(2)

As a countermeasure which depicted in Figure

2, in each time slot the sender sends a frame

consists 3 elements: payload, verification code

and the Next Action Indicator (NAI). The

verification code part is used to check whether this

frame is corrupted by the jammer, while the NAI

indicates whether to continue send signal or keep

silent next frame, i.e. indicates the receiver

whether a legitimate frame comes next time slot.

We consider the duration of the NAI part is too

small that cannot be influenced by the jamming.

To make the problem clear to understand, we

assume the receiver can immediately sensor the

communication status i.e. whether the channel is

jammed by the malicious attacker.

Payload

Verification Code

Next Action Indicator

Figure 2: Component of a frame.

We define the current the state of the current

time slot

, At the end of each time slot, the sender

observes the state of the current time slot

, and

select the corresponding action

{0,1}

a 

with a

probability

, “0” represents “to silent”, “1”

represents “to continue”. When taking action “to

silent”, the sender would stop sending message until

a jamming is detected; when taking action “to

continue”, the sender would continue sending

message next time slot. If this is the Kth consecutive

slot with successful transmission, the state is

denoted by

SK

. The transmitter receives an

immediate payoff

()Un

in the nth time slot, decided

 

( ) Send (Transmitted) (Jammed)U n R L    1 1 1

(3)

where

represents communication gain, while

represents jammed loss.

()1

is the indicator

function returns 1 when the statement in the

parenthesis holds TRUE and 0 otherwise.

The transition of states can be described as a

Markov chain, as show in Figure 3. The transition

probabilities depend on the action taken by the

transmitter. We use

( ,0|)

P S S

and

( ,1|)

P S S

to represent the transition probability from the

current state

to a new state

when taking action

0 and 1, respectively. Obviously,

( | ,1) 1 ( )

( | ,1) ( )

( | ,0) 1, 0

n n J

P S S p n

P S S n









(4)

CTISC 2019 - International Conference on Advances in Computer Technology, Information Science and Communications

0 1 2 K

...

0 1 2

...

(0)

(1)

(2)

(K)

1-p

(0) 1-p

(1)

1-p

(K-1)

(a) Transition of states when taking action silent

(b) Transition of states when taking action continue

Figure 3: Markov chains of state transitions when different

actions are taken.

Note that the jammer may jam the channel with

some probability, the transmitter will have

possibility being jammed when taking action

“continue” at a certain state. The state of the next

time slot depends on the action of the current time

slot, the jamming condition and the state of current

time slot, hence we can model this scenario as a

Markov decision process (MDP), from which the

defense strategy is obtained.

0 1 2

...

(0)

1-p

(0)

(1-p

(1))p

(1) (1-p

(K-1))p

(K-1)

1-(1-p

(1))p

(1)

1-(1-p

(2))p

(2)

Figure 4: Total transition probability.

An MDP consists of four important components,

namely, a finite set of states, a finite set of actions,

transition probabilities, and immediate payoffs(Wu

et al., 2011). As the attacker jams the channel with

some distribution, if the transmitter continues

sending message, it will be eventually blocked by

the jammer. Thus, the state

will be finite. Denote

the maximum possible According to (3), the

immediate payoff depends on both the state and the

action of previous time slot, i.e.

, if {0,1,2,...}, 1,

( , ) , if {1,2,3,...}, 1,

0, if {0,1,2,...}, 0

n n n n

R S a Unjammed

U S a L S a Jammed







   









(5)

3 CALCULATION

Our goal is to find the appropriate

that can

maximize the sum of immediate payoff and expected

payoff conditioned on the current action probability.

( ) ( , )

U U S a





(6)

In all the scenarios above, the interval between 2

jamming signals is not infinite in practice, so we can

be informed that the states are also finite. Denote

max









as the maximum states count, where

max

()

thresh

f t p

( | )

()

(|)

L L L L

f t n t t n t t t n

f t t n t t n

      



   

(7)

Denote

( , ,... )

S S SS

which represents state

vector,

( , ,... )

a a aa

represents action vector;

( (0), (1),... ( ))

S S S

p p p N

represents vector of

probabilities of each state;

( (0), (1),... ( ))

a a a

p p p N

represents vector of

probabilities of action on each state;

( (0), (1),... ( ))

J J J

p p p N

represents vector of

jamming probability on each state.

From the definition,

(0) 1

p 

( ) 1

pN

( ) 0

pN

. With the transition probability

can derive the total state transition probability matrix

   

 

1 1 (0) (0) 1 (0) (0) ... 0

1 1 (1) (1) 0 ... 0

... ... ... ...

1 0 ... 0

J a J a

p p p p



  





























(8)

Proposition 1: The states probability

determined by

and

Proof: from (8), the following stationary

distribution of the Markov chain can be formulated

as below:

 P

(9)

An MDP-Based Time Domain Impulse Jamming Mitigation Scheme

(0) ( )

(1) (0)

(2) (1)

...

( ) ( 1)

S n S

S N N S

p p p n

p p p

p N p p N











(10)

Subject to

( ) 1





(11)

denotes the transition probability of state

. Thus, the states’ probability can be figured out

(0) ( ) 1

S S k k

p p n p









  











(12)

According to (8)(10)(12),

 

   

(0)

(1) 1 (0) (0)

...

( ) 1 (0) ... 1 ( 1) (0)

S J S

S J J S

p p p

p N p p N p

























     



(13)

Thus, the states’ probability

can be

determined by

and

The returning to state

from state

through

several states is defined as a cycle. It is obvious that

the state will eventually return to

. The total

probability from each path in a cycle is 1, i.e.

0 1 0

Pr{ ... } 1

S S S S





(14)

The expectation of total payoff in one cycle is

denoted as

()U S

, which can be obtained by the

equation below.

 

()

( ) ( )( )

(0)

()

1 ( )

(0)

p n p n nR L

p n nR











































  

















(15)

From the equation, it can be inferred that

()U S

is determined by

and

. Our algorithm aims to

achieve the goal of maximizing

()U S

, the

following equation is needed.

 

argmax ( )

()

( ) ( )( )

(0)

argmax

()

1 ( )

(0)

p n p n nR L

p n nR







































  



















(16)

It is theoretically possible to calculate the

optimal

. However, the calculation is too

complex to solve due to

is a

max





dimension

vector. If the probability resolution is

, the

solution space will be

max

















. To find the max

value and its index in acceptable time, the Simulated

Annealing Algorithm(Ogbu and Smith, 1990) is

introduced into this model. We set the vector

argument. At each epoch, we make a random change

and compare the expectation of corresponding

()U S

. The change is accepted according to

metropolis criterion. After sufficiently large time,

the probability of

 

argmax ( )U

will be 1.

4 ONLINE LEARNING APPROAH

In practice, the jamming probability vector

cannot be obtained from the environment

immediately. To approach the optimal transition

probability vector

, We propose an online

learning algorithm. According to the simulation

results, the optimal

{0,1}

, and all “1” occur

before a certain time slot and all “0” after that

certain time slot. So we translate the problem into

find this certain time slot as boundary before which

CTISC 2019 - International Conference on Advances in Computer Technology, Information Science and Communications

continuously transmitting signal. We formulate the

problem as a multi-armed bandit problem. The user

selects Kth arm represents continuously transmit

time slots. Facing this problem, the trade-off

between exploration and exploitation is the focus of

the problem. If the user chooses the exclusively on

the arm that he thinks is best(exploit), he may miss

the actual best arm. If the user keeps trying out all

the arms and gathering statistics (“exploration”), he

may fail to play the best arm often enough to get a

high return(Auer et al., 2011).

We define 2 variables, exploration rate



and

temperature

to solve the problem. Both the two

variables decrease over iteration. If the exploration

rate is higher, the user is more likely to choose the

new state to calculate the new payoff and vise versa.

If the temperature is higher, the user is more likely

to accept the new arm and vice versa.

The user selects the first arm at first transmitting

cycle and calculate the sum payoff according to

actual jammings. Our algorithm is depicted as

Figure 5.

Init: , 

Calculate 





Update 

according to 





  









    





Update  

Convergence?

Update 

according to 

Calcuate 



  

according to 

end

Figure 5: Online learning algorithm to approach optimal

boundary.

5 SIMULATION RESULTS

First, we calculated the effect of different parameters

to transmitting probability

. We set transmission

gain

10R 

, time slot length

0.01

ts

. For a

normal distributed jamming, the jamming period

0.5T 

and variance

0.1





. With different

10,20,30,40,50L 

. The result is shown in Figure 6.

The transmitting probability goes from 1 to 0 with as

sharp gradient. Different jamming loss result in

different transmitting probability vector. With

greater jamming loss, the transmitting time slot

trends to be shorter.

Figure 6: Optimal transmitting probability with different

jamming loss with

0.1





In Figure 7, we compared our scheme with

different combinations of the attack and the defense

strategy. The transmission gain and jamming loss

are set to 10 and 30 respectively. We plotted the

average sum payoff of a cycle in all the 5 situations.

First, the communication is under a non-jamming

environment. The equivalent average sum payoff of

a cycle is the highest. After that, a periodic impulse

jamming occurs and makes the average sum payoff a

great loss. As a countermeasure, the authorized user

takes periodic transmission to withstand the impulse

jamming. The malicious attacker then chooses

random periodic impulse jamming strategy, which

drops the average sum payoff most. To mitigate the

jamming effect, the authorized user then chooses

MDP-Based impulse jamming mitigation scheme

that rises the average sum payoff curve.

An MDP-Based Time Domain Impulse Jamming Mitigation Scheme

Figure 7: Comparison of different anti-jamming schemes.

In Figure 8, the proposed algorithm in section 4

is verified. We generated normally distributed

jamming with a mean of

, 1,2,3,...kT k 

0.5T 

variance

0.1





. Transmitting gain

10R 

jamming loss

50L 

. The theory value optimal state

is calculated based on optimal

obtained at the

beginning of this section.

Figure 8: Comparison between learned value and

theoretical value.

6 CONCLUSIONS

In this paper, we proposed a Markov decision

process based impulse jamming mitigation scheme

and used simulated annealing algorithm to obtain the

numerical result. The optimal transmitting

probability is either 0 or 1, and the continuously

transmission slot is shorter when jamming loss

grows. We have shown that our scheme is better

than periodic transmission scheme under same

random periodic impulse jamming environment.

REFERENCES

Auer, P., N. Cesa-Bianchi, Y. Freund and R. E.

Schapire, 2011. The Non-Stochastic Multi-

Armed Bandit Problem. In Siam Journal on

Computing 32(1):48-77.

Debruhl, B. and P. Tague, 2013. How to jam without

getting caught: Analysis and empirical study of

stealthy periodic jamming. In IEEE International

Conference on Sensing.

Epple, U. and M. Schnell, 2017. Advanced Blanking

Nonlinearity for Mitigating Impulsive

Interference in OFDM Systems. In IEEE

Transactions on Vehicular Technology

66(1):146-158.

He, Y. Y., C. J. Yu, T. F. Quan and X. Jin, 2008.

Impulsive interference detection method based

on Morlet wavelet and maximum likelihood

estimation. In IEEE International Conference on

Industrial Informatics.

Jie, Y., H. Bingyang and X. Jingying, 2017. Optimal

Periodic Pulse Jamming Signal Design for QPSK

Systems. In Journal of Beijing Institute of

Technology 26(3):381-387.

Landa, I., A. Blazquez, M. Velez and A. Arrinda,

2017. Indoor measurements of IoT wireless

systems interfered by impulsive noise from

fluorescent lamps. In European Conference on

Antennas & Propagation.

Lin, J., T. Pande, I. H. Kim, A. Batra and B. L.

Evans, 2015. Time-Frequency Modulation

Diversity to Combat Periodic Impulsive Noise in

Narrowband Powerline Communications. In

IEEE Transactions on Communications

63(5):1837-1849.

Ogbu, F. A. and D. K. Smith, 1990. The application

of the simulated annealing algorithm to the

solution of the n / m / C max flowshop problem.

In Computers & Operations Research 17(3):243-

253.

Poisel, R. A., 2011. Modern Communications

Jamming Principles and Techniques, Artech

House, Inc.

Wu, Y., B. Wang, K. J. R. Liu and T. C. Clancy,

2011. Anti-Jamming Games in Multi-Channel

Cognitive Radio Networks. In IEEE Journal on

Selected Areas in Communications 30(1):4-15.

CTISC 2019 - International Conference on Advances in Computer Technology, Information Science and Communications