In particular we have:
(4)
where t
p
and f
p
are the true positives and negatives
respectively and
2∙
∙
(5)
where Recall is the proportion of actual positives
correctly identified as such (equivalent to True
Positive Rate). From the presented comparison it can
be observed that Naïve Bayes is consistently
outperformed by the other strategies, likely due the
violation of the hypothesis of class conditional
independence. The remaining three algorithms
instead, achieve comparable performances, with J48
proving slightly superior in regard to False Positive
Rate and Random Forests exhibiting better results in
terms of True Positive Rate.
Figure 11: ROC curves and areas under ROCs (AUC).
7 CONCLUSIONS
WebRTC technology brings a noteworthy novelty in
the field of web-based multimedia communications
providing a peer-to-peer browser-based service for
audio and video streaming without neglecting the
security issues. To this aim, the embedded DTLS
protocol, designed to prevent eavesdropping and
information tampering, enables even the non-skilled
user to setup an encrypted real-time multimedia
session relying just on a common web browser. Due
to such capability, the authorities in charge of lawful
interception issues have to deal with new services
that may elude traditional kinds of control. An
audio/video session based on the encrypted
WebRTC protocol in fact, is difficult to reveal
because it can use dynamic ports allocation and does
not include any characteristic pattern that allows a
semantic-based recognition. In our paper we propose
and assess an automatic decision system capable to
reveal an encrypted WebRTC session with the
support of Weka, a popular open source machine
learning tool. The system includes a purposely
developed Java engine to train the classifier based
on a dataset containing traffic partitioned in the two
classes envisioned: WebRTC and Normal. In our
implementation we considered, as class attributes,
some typical parameters derived during a real-time
traffic processing phase such as: transport protocol,
inter-arrival times, packet lengths and number of
packets in forward and backward directions.
Through a cross validation assessment scheme based
on the training dataset, four among the most credited
classification algorithms have been compared: J48,
Simple Cart, Naïve Bayes and Random Forests. The
experiment suggests that the Random Forests offer
best results in terms of True Positive Rate whereas
J48 performs better in terms of False Positive Rate
detection. Future works will be aimed to compare a
broader range of algorithms, including distributed
ones, using large enough dataset to improve the
significance of the results.
REFERENCES
Aggarwal, C., 2014. Data classification: algorithms and
applications. Taylor & Francis Press.
Aruna, S., Rajagopalan, S.P., Nandakishore, L.V., 2001.
An Empirical comparison of Supervised learning
algorithms in disease detection. In IJITCS, Vol1, N
o
4.
Baugher, M., McGrew, D., Naslund, M., Carrara, E.,
Norrman, K., 2004. IETF, RFC3711.
Breiman, L., Friedman, J., Olshen, R., Stone, C., 1984.
Classification and Regression Trees. Taylor & Francis
Press.
Breiman, L., 2001. Random Forests. In Machine Learning
Journal, Vol. 45, N°1, pp. 5-32.
Dainotti, A., De Donato, W., Pescape, A., Rossi, P.S.,
2008. Classification of Network Traffic via Packet-
Level Hidden Markov Models. In GLOBECOM’08,
pp.1-5. IEEE.
Di Mauro, M., Longo, M., 2014. Skype traffic detection: a
decision theory based tool. In ICCST'14, pp. 52-57.
IEEE.
Esposito, F., Malerba, D., Semeraro, G., 1997. A
comparative analysis of methods for pruning decision
trees. In IEEE Transaction on Pattern analysis and
machine intelligence, Vol. 19, No. 5.
Freire, E. P., Ziviani, A., Salles R. M., 2008. Detecting
Skype flows in Web traffic. In NOMS’08, pp. 89-96.
IEEE.
James, G., Witten, D., Hastie, T., Tibshirani, R., 2014. An
introduction to statistical learning with applications in
R. Springer.
RevealingEncryptedWebRTCTrafficviaMachineLearningTools
265