which means that time-related features are good clas-
sifiers to characterize encrypted and VPN traffic.
5.2 Analysis of Scenario B
In this Scenario all encrypted and VPN traffic are
mixed together in one dataset, and the objective is
to characterize the traffic without previously dividing
VPN from Non-VPN traffic, therefore we will have
14 types of traffic: 7 encrypted and 7 VPN traffic
categories. The results are shown in Figure 3 (parts
e,f,g,h).
In this case, we cannot see the pattern ’shorter
timeout - better accuracy’ as clear as in the previ-
ous scenario (5.1). For example using the C4.5 al-
gorithm the Pr of VPN-Browsing, VPN-Mail, and
Mail with 15 sec is 0.771, 0.739, 0.671 respectively,
values lower than the 0.809, 0.786, 0.79 obtained
with 120 sec. The KNN results are similar, the Pr
of VPN-Browsing, VPN-Chat, and VPN-Mail traf-
fic categories is (0.691, 0.501, 0.688) for 15s. ftm,
smaller than the Pr obtained with 120 sec (0.743,
0.501, 0.688). On the other hand, the highest aver-
age Pr from the different ftm values is around 0.783
for C4.5 and 0.711 for KNN algorithms, around 0.5
points lower that the best values from Scenario A.
6 CONCLUSIONS
In this paper we have studied the efficiency of time-
related features to address the challenging problem of
characterization of encrypted traffic and detection of
VPN traffic. We have proposed a set of time-related
features and two common machine learning algo-
rithms, C4.5 and KNN, as classification techniques.
Our results prove that our proposed set of time-related
features are good classifiers, achieving accuracy lev-
els above 80%. C4.5 and KNN had a similar perfor-
mance in all experiments, although C4.5 has achieved
better results. From the two scenarios proposed, char-
acterization in 2 steps (scenario A) vs. characteri-
zation in one step (scenario B), the first one gener-
ated better results. In addition to our main objective,
we have also found that our classifiers perform better
when the flows are generated using shorter timeout
values, which contradicts the common assumption of
using 600s as timeout duration. As future work we
plan to expand our work to other applications and
types of encrypted traffic, and to further study the
application of time-based features to characterize en-
crypted traffic.
REFERENCES
Aceto, G., Dainotti, A., de Donato, W., and Pescape, A.
(2010). Portload: Taking the best of two worlds in
traffic classification. In IEEE Conference on Com-
puter Communications Workshops, INFOCOM 2010,
pages 1–5. IEEE.
Aghaei-Foroushani, V. and Zincir-Heywood, A. (2015). A
proxy identifier based on patterns in traffic flows. In
IEEE 16th International Symposium on High Assur-
ance Systems Engineering, HASE 2015, pages 118–
125. IEEE.
Bernaille, L. and Teixeira, R. (2007). Early recognition
of encrypted applications. In Proceedings of the 8th
International Conference on Passive and Active Net-
work Measurement, PAM’07, pages 165–175, Berlin,
Heidelberg. Springer-Verlag.
Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., and
Salamatian, K. (2006). Traffic classification on the fly.
ACM SIGCOMM Computer Communication Review,
36(2):23–26.
Callado, A., Kamienski, C., Szabo, G., Gero, B., Kelner, J.,
Fernandes, S., and Sadok, D. (2009). A survey on in-
ternet traffic identification. Communications Surveys
& Tutorials, IEEE, 11(3):37–52.
Coull, S. E. and Dyer, K. P. (2014). Traffic analysis of
encrypted messaging services: Apple imessage and
beyond. ACM SIGCOMM Computer Communication
Review, 44(5):5–11.
G
´
omez Sena, G. and Belzarena, P. (2009). Early traffic clas-
sification using support vector machines. In Proceed-
ings of the 5th International Latin American Network-
ing Conference, LANC ’09, pages 60–66, New York,
NY, USA. ACM.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann,
P., and Witten, I. H. (2009). The weka data mining
software: An update. ACM SIGKDD Explorations
Newsletter, 11(1):10–18.
Iliofotou, M., Pappu, P., Faloutsos, M., Mitzenmacher, M.,
Singh, S., and Varghese, G. (2007). Network moni-
toring using traffic dispersion graphs (tdgs). In Pro-
ceedings of the 7th ACM SIGCOMM Conference on
Internet Measurement, IMC ’07, pages 315–320, New
York, NY, USA. ACM.
Karagiannis, T., Papagiannaki, K., and Faloutsos, M.
(2005). Blinc: Multilevel traffic classification in the
dark. In Proceedings of the 2005 Conference on
Applications, Technologies, Architectures, and Proto-
cols for Computer Communications, SIGCOMM ’05,
pages 229–240, New York, NY, USA. ACM.
Kim, H., Claffy, K., Fomenkov, M., Barman, D., Falout-
sos, M., and Lee, K. (2008). Internet traffic classifi-
cation demystified: Myths, caveats, and the best prac-
tices. In Proceedings of the 2008 ACM CoNEXT Con-
ference, CoNEXT ’08, pages 11:1–11:12, New York,
NY, USA. ACM.
Li, W., Canini, M., Moore, A. W., and Bolla, R. (2009). Ef-
ficient application identification and the temporal and
spatial stability of classification schema. Computer
Networks: The International Journal of Computer and
Telecommunications Networking, 53(6):790–809.
Characterization of Encrypted and VPN Traffic using Time-related Features
413