is critical, is very sensitive to the noise level. Fur-
thermore, the median filtering might destroy the pat-
tern of the PRIs, making the classification even harder
in some cases. Also, note that existing classification
methods rely on handcrafted features and/or thresh-
olds are that are not robust to a wide range of noise
imposed on PRI sequences. The above shortcomings
can be overcome with a deep Convolutional Neural
Network (CNN) as will be described next.
3 CNN ARCHITECTURE
We propose in this section a CNN, whose architecture
is depicted in Fig. 3, for the PRI classification. As op-
posed to the previous neural-network schemes, the in-
put to the proposed CNN is the raw PRI sequence of
fixed size 1 × 1000, supposedly obtained from a TOA
estimation algorithm. That is, both the denoising and
the feature extraction are handled by the network it-
self. This is done via a concatenation of 8 convolution
layers and 2 fully connected (dense) layers. Roughly
speaking, the convolution layers play the role of a fea-
ture extraction that is robust to noise, while the dense
layers take care of the classification based on the out-
put of the final convolution layer. The whole network
is trained end-to-end on a dataset of PRI sequences
labeled with ground-truth modulation types indexed
from 0 to 6. Note that before feeding a PRI sequence
p[n] to the network, we normalize it as
p
norm
[n] =
p[n]
max
i
p[i]
, ∀n. (2)
Following the design philosophy of the VGG-
net (Simonyan and Zisserman, 2015), all filters used
in the 8 convolution layers are of fixed size 1 × 3,
resulting in a receptive field of 8 × 2 + 1 = 17 sam-
ples. The number of filters is decreased from 32 to 4
along the convolution layers. Note that, in each con-
volution layer, a batch normalization is used to com-
bat the internal covariate shift as suggested in (Ioffe
and Szegedy, 2015). Moreover, the Rectified Linear
Unit (ReLU) is used as the activation function for all
convolution layers. The result of the final convolu-
tion layer is 4 feature maps, each of size 1 × 1000,
which are then flattened into a single feature vector
of length 4000. This vector is fully connected to a
layer of 256 neurons with ReLU activation. To pre-
vent over-fitting, a dropout layer (Srivastava et al.,
2014) with dropping ratio of 0.7 is inserted in be-
tween these layers. Finally, the output of the Dense-
256 layer is transformed to a score vector of length
7 via the last fully connected layer with the softmax
activation function. The score vector can be thought
of as a probability distribution of the 7 classes.
The network weights are trained by minimizing
a loss function defined as the cross-entropy between
the output vector of the network and the one-hot vec-
tor associated with the ground-truth label of the input
PRI sequence. This optimization procedure can be re-
alized by a stochastic gradient descent algorithm. In
the testing phase, the class of an input PRI sequence
is simply determined by taking the index of the output
vector that yields the maximum score.
4 PERFORMANCE EVALUATION
In order to evaluate the performance of our CNN-
based PRI classification, a set of 140,000 randomly
generated PRI sequences (20,000 for each modulation
type) was used for training and another set of 35,000
(5,000 for each class) was used for testing. Both train-
ing and testing data were generated randomly accord-
ing to a variety of parameters given in Table 1. It
should be noted that the training set and the testing set
are separated and different from each other. There-
fore, the testing set can be fairly used to verify the
accuracy and generalization of the proposed model.
The data generation was performed in Matlab, while
the training was implemented in Python with Keras li-
brary and TensorFlow backend, running on an Nvidia
Tesla P100 GPU. The final CNN model was obtained
after being trained for 100 epochs with Adam opti-
mizer and with a learning rate of 10
−4
.
We compare CNN with 3 state-of-the-art competi-
tors including both decision-tree-based and learning-
based methods as follows:
• The proposed scheme in (Song et al., 2010): This
method is able to classify only 5 PRI modula-
tion types (DS, SLD, SLU, JIT, and WOB) by ap-
plying a decision tree on features extracted from
the symbolizations of PRI and differential PRI
sequences. Hence, we refer to this method as
Symbolization-based Decision Tree (SDT). The
thresholds used in this algorithm were carefully
chosen to optimize the classification accuracy.
• The proposed scheme in (Noone, 1999), which
we call Transform-based Neural Network (TNN):
This learning-based method uses the second dif-
ferences of TOAs feature as the input of a feed-
forward neural-network.
• The proposed scheme in (Liu and Zhang, 2017):
This method uses a feed forward neural network
consisting of an input layer with 3 extracted fea-
tures based on the second differences of TOAs to
classify 4 PRI modulation types (WOB, JIT, DS,
and Sliding). It should be noted that in that pa-
Deep Learning for Pulse Repetition Interval Classification
315