Characterization of Encrypted and VPN Trafﬁc using Time-related

Features

Gerard Draper-Gil, Arash Habibi Lashkari, Mohammad Saiful Islam Mamun and Ali A. Ghorbani

University of New Brunswick, Fredericton NB E3B 5A3, New Brunswick, Canada

Keywords:

Trafﬁc Classiﬁcation, Encrypted Trafﬁc Characterization, Flow Time-based Features, VPN Trafﬁc Character-

ization, Flow Timeout Value.

Abstract:

Trafﬁc characterization is one of the major challenges in today’s security industry. The continuous evolution

and generation of new applications and services, together with the expansion of encrypted communications

makes it a difﬁcult task. Virtual Private Networks (VPNs) are an example of encrypted communication service

that is becoming popular, as method for bypassing censorship as well as accessing services that are geographi-

cally locked. In this paper, we study the effectiveness of ﬂow-based time-related features to detect VPN trafﬁc

and to characterize encrypted trafﬁc into different categories, according to the type of trafﬁc e.g., browsing,

streaming, etc. We use two different well-known machine learning techniques (C4.5 and KNN) to test the ac-

curacy of our features. Our results show high accuracy and performance, conﬁrming that time-related features

are good classiﬁers for encrypted trafﬁc characterization.

1 INTRODUCTION

Trafﬁc classiﬁcation technologies have received in-

creased attention over the last decade due to the im-

plementation of mechanisms for network quality of

service (QoS), security, accounting, design and engi-

neering. The networking industry as well as the re-

search community have dedicated many efforts to the

research of these technologies and came up with sev-

eral classiﬁcation techniques (Callado et al., 2009).

However, the continuous expansion of Internet and

mobile technologies are creating a dynamic environ-

ment where new applications and services emerge ev-

ery day, and the existing ones are constantly evolv-

ing. Moreover, encryption is becoming pervasive in

today’s Internet, serving as a base for secure commu-

nications. This constant creation, evolution, and se-

curization of applications makes trafﬁc classiﬁcation

a great challenge for the Internet research community.

Trafﬁc classiﬁcation can be categorized based on

its ﬁnal purpose: associating trafﬁc with encryp-

tion (e.g., encrypted trafﬁc), protocol encapsulation

(e.g., tunneled through VPN or HTTPS); accord-

ing to speciﬁc applications, (e.g., Skype), or accord-

ing to the application type (e.g., Streaming, Chat),

also called trafﬁc characterization. Some applications

(e.g., Skype, Facebook) support multiple services like

chat, voice call, ﬁle transfer, etc. These applications

require identifying both the application itself and the

speciﬁc task associated with it. Very few trafﬁc classi-

ﬁcation techniques in the literature address this chal-

lenging trends (Wang et al., 2014; Rao et al., 2011;

Coull and Dyer, 2014).

In early 90’s, the initial trafﬁc classiﬁcation tech-

niques associated transport layer ports with speciﬁc

applications, a simple and fast technique. But, its low

accuracy and unreliability rendered the development

of Deep Packet Inspection (DPI) approaches. The

DPI approach analyzes packets and classiﬁes them

according to some stored signature or pattern. How-

ever, DPI techniques that require payload examina-

tion are not computationally efﬁcient, specially over

high-bandwidth network. Moreover, they are often

circumvented by encapsulated, encrypted, or obfus-

cated trafﬁc that precludes payload analysis.

Selecting effective and reliable features for trafﬁc

analysis is still a serious challenge. Generally speak-

ing the classiﬁcation of network trafﬁc falls mainly

into two categories: ﬂow-based classiﬁcation, using

properties such as ﬂow bytes per second, duration per

ﬂow, etc. and packet-based classiﬁcation, using prop-

erties such as size, inter-packet duration of the ﬁrst (or

n) packets, etc.

In this paper, we focus on analyzing regular en-

crypted trafﬁc and encrypted trafﬁc tunneled through

a Virtual Private Network (VPN). The characteriza-

Draper-Gil, G., Lashkari, A., Mamun, M. and Ghorbani, A.

Characterization of Encrypted and VPN Trafﬁc using Time-related Features.

DOI: 10.5220/0005740704070414

In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP 2016), pages 407-414

ISBN: 978-989-758-167-0

407

tion of VPN trafﬁc is a challenging task that remains

to be solved. VPN tunnels are used to maintain the

privacy of data shared over the physical network

connection holding packet-level encryption, therefore

making very difﬁcult to identify the applications

running through these VPN services.

Our Contribution in this paper is twofold. First,

we propose a ﬂow-based classiﬁcation method to

characterize encrypted and VPN trafﬁc using only

time-related features. Moreover, we reduce the com-

putational overhead by reducing the set of features

to a set that can be extracted with low computational

complexity (Kim et al., 2008; Li et al., 2009). And

second, we generate and publish an extensive labeled

dataset of encrypted trafﬁc, with 14 different labels

(7 for regular encrypted trafﬁc and 7 for VPN trafﬁc).

We choose only time-related features to expedite the

efﬁciency and to ensure an encryption independent

trafﬁc classiﬁer.

The remainder of this paper is organized as fol-

lows: Section 2 presents an overview of encrypted

trafﬁc classiﬁcation. In Section 3 we describe the

dataset. In Section 4 describes the experiments

executed on the captured dataset, while Section 5

presents and discusses the results obtained. Finally,

Section 6 presents the conclusions and future work.

2 RELATED WORK

Studies on packet size and ﬂow based trafﬁc classi-

ﬁcation were started in early 90’s by Paxson et al.

in (Paxson, 1994; Paxson and Floyd, 1995), where

some statistical features like packet length, inter-

arrival times and ﬂow duration were supposed to be

suitable to trace protocols. Later Belzarena et al. in

omez Sena and Belzarena, 2009) and Li et al. in (Li

et al., 2009) used the statistics from the ﬁrst few pack-

ets of the ﬂow to gain efﬁciency. Moreover, in order to

expedite the classiﬁcation efﬁciency in a high-scale,

high speed network, Nucci et al. in (Yeganeh et al.,

2012) and Pescap et al. in (Aceto et al., 2010) pro-

posed a signature based trafﬁc identiﬁcation scheme.

Although they reduced the time to classify the ﬂows,

they failed to detect unknown or manually created sig-

natures.

Trafﬁc characterization techniques are not widely

addressed in the current literature. Moreover, most

of them focus on speciﬁc application type or devices.

Wang et al. (Wang et al., 2014) proposed a model to

characterize P2P trafﬁc. They extracted features from

multiple ﬂows and aggregated ﬂows into clusters to

extract P2P application behaviour. Coull et al (Coull

and Dyer, 2014) present a study on the iMessage pro-

tocol to identify the type of device. In (Rao et al.,

2011), Rao et al. propose a network characteristics

model for two of the most most popular video stream-

ing services, Netﬂix and YouTube. In (Mauro and

Longo, 2015), Mauro and Longo propose a method

to detect encrypted WebRTC trafﬁc. Mamun et al.

(Mohammad S.I. Mamun and Ghorbani, 2015) pro-

posed a method to identify enrypted trafﬁc by mea-

suring the entropy of the packet’s payload. Sherry et

al. (Sherry et al., 2015) propose a DPI system that

can inspect encrypted payload without decrypting it,

therefore maintaining the privacy of the communica-

tions, but it can only process HTTPS trafﬁc.

A number of machine learning classiﬁcation

methods based on ﬂow (Bernaille and Teixeira, 2007;

Moore and Zuev, 2005) and packet-based (Iliofotou

et al., 2007; Karagiannis et al., 2005) features have

been proposed in the literature to identify trafﬁc ac-

curately. However, trafﬁc classiﬁcation for the en-

capsulated protocols (e.g., using Proxy server or VPN

tunnels) that are mainly used for hiding the identities

of the users for privacy reasons, are challenging and

hence are not widely explored in the literature. How-

ever, recently, Heywood et al. in (Aghaei-Foroushani

and Zincir-Heywood, 2015) proposed a data driven

classiﬁer to identify trafﬁc coming from clients be-

hind a proxy server using trafﬁc ﬂow information.

To the best of our knowledge, we are the ﬁrst to

propose a method to characterize VPN trafﬁc in a

broad sense, identifying 7 different trafﬁc categories.

3 DATASET GENERATION

To create a representative dataset we captured real

trafﬁc generated by our lab members. We created ac-

counts for users Alice and Bob in order to use services

like Skype, Facebook, etc. In Table 1 we provide the

complete list of different types of trafﬁc and appli-

cations included in our dataset. For each trafﬁc type

(VoIP, P2P, etc...) we captured a regular session and

a session over VPN, therefore we have a total of 14

trafﬁc categories: VOIP, VPN-VOIP, P2P, VPN-P2P,

etc. Following, we give a detailed description of the

different types of trafﬁc generated:

Browsing: Under this label we have HTTPS trafﬁc

generated by users while browsing or performing

any task that includes the use of a browser. For in-

stance, when we captured voice-calls using hang-

outs, even though browsing is not the main activ-

ity, we captured several browsing ﬂows.

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

408

Table 1: List of Captured protocols and applications.

Trafﬁc Content

Web Browsing Firefox and Chrome

Email SMPTS, POP3S and IMAPS

Chat ICQ, AIM, Skype, Facebook and Hangouts

Streaming Vimeo and Youtube

File Transfer Skype, FTPS and SFTP using Filezilla and an external service

VoIP Facebook, Skype and Hangouts voice calls (1h duration)

P2P uTorrent and Transmission (Bittorrent)

Table 2: List of time based features.

Feature Description

duration The duration of the ﬂow.

ﬁat Forward Inter Arrival Time, the time between two packets sent forward direction (mean, min, max, std).

biat Backward Inter Arrival Time, the time between two packets sent backwards (mean, min, max, std).

ﬂowiat Flow Inter Arrival Time, the time between two packets sent in either direction (mean, min, max, std).

active The amount of time time a ﬂow was active before going idle (mean, min, max, std).

idle The amount of time time a ﬂow was idle before becoming active (mean, min, max, std).

fb psec Flow Bytes per second.

fp psec Flow packets per second.

Email: The trafﬁc samples generated using a Thun-

derbird client, and Alice and Bob Gmail ac-

counts. The clients were conﬁgured to de-

liver mail through SMTP/S, and receive it using

POP3/SSL in one client and IMAP/SSL in the

other.

Chat: The chat label identiﬁes instant-messaging ap-

plications. Under this label we have Facebook and

Hangouts via web browser, Skype, and IAM and

ICQ using an application called pidgin.

Streaming: The streaming label identiﬁes multime-

dia applications that require a continuous and

steady stream of data. We captured trafﬁc from

Youtube (HTML5 and ﬂash versions) and Vimeo

services using Chrome and Firefox.

File Transfer: This label identiﬁes trafﬁc applica-

tions whose main purpose is to send or receive

ﬁles and documents. For our dataset we captured

Skype ﬁle transfers, FTP over SSH (SFTP) and

FTP over SSL (FTPS) trafﬁc sessions.

VoIP: The Voice over IP label groups all trafﬁc gen-

erated by voice applications. Within this label

we captured voice-calls using Facebook, Hang-

outs and Skype.

P2P: This label is used to identify ﬁle-sharing pro-

tocols like Bittorrent. To generate this trafﬁc we

downloaded different .torrent ﬁles from a public

a repository (archive.org) and captured trafﬁc ses-

sions using the uTorrent and Transmission appli-

cations.

The trafﬁc was captured using Wireshark and tcp-

dump , generating a total amount of 28GB of data.

For the VPN trafﬁc, we used an external VPN service

provider and connected to it using OpenVPN. To gen-

erate SFTP and FTPS trafﬁc we also used an external

service provider and Filezilla as a client.

Figure 1: Characterization Scenarios.

4 EXPERIMENTS

We have deﬁned two different scenarios A and B,

depicted in Figure 1. As described in Section 3, we

have used 4 different ﬂow timeout values to generate

our datasets, and we have chosen 2 machine learning

algorithms (C4.5 and KNN). Therefore, we will

have to execute each experiment 8 times. We have

Characterization of Encrypted and VPN Trafﬁc using Time-related Features

409

designed a total of 3 experiments, 2 for scenario A

and one for scenario B:

Scenario A: The objective of this scenario is to char-

acterize encrypted trafﬁc with VPN identiﬁcation,

e.g. we will distinguish between voice-calls (VOIP)

and voice-calls tunneled through VPN (VPN-VOIP).

As a result we will have 14 different types of trafﬁc,

7 regular types of encrypted trafﬁc and 7 VPN types

of trafﬁc. In this Scenario we do the characterization

in two steps. First, we distinguish between VPN and

Non-VPN trafﬁc and then we characterize each type

of trafﬁc separately (VPN and Non-VPN). In order to

do this, we have divided our dataset in two different

datasets: one with regular encrypted trafﬁc ﬂows and

the other one with VPN trafﬁc ﬂows.

Scenario B: In this Scenario, we use a mixed dataset

to do the characterization in one step. The input of

our classiﬁer is regular encrypted trafﬁc and VPN

trafﬁc, and as output we have the same 14 different

categories (Section 3).

4.1 Flow and Features Generation

We use a common deﬁnition of ﬂow, where a ﬂow is

deﬁned by a sequence of packets with the same val-

ues for {Source IP, Destination IP, Source Port, Des-

tination Port and Protocol (TCP or UDP)}. Flows

are considered to be bidirectional (forward and re-

verse directions) as in most of the reviewed papers

(e.g.,(McGregor et al., 2004; Zander et al., 2005;

Bernaille et al., 2006; Williams et al., 2006; Palmieri

and Fiore, 2009)). Along with the ﬂow generation

we have to calculate the features associated with each

ﬂow. Many papers in the literature use a tool called

NetMate to generate ﬂows and features, but as part

of our work we have developed our an application,

ISCXFlowMeter. It is written in Java and gives us

more ﬂexibility in terms of choosing the features we

want to calculate, adding new ones, and also hav-

ing a better control of the duration of the ﬂow time-

out. ISCXFlowMeter generates bidirectional ﬂows,

where the ﬁrst packet determines the forward (source

to destination) and backward (destination to source)

directions, hence the statistical time-related features

are also calculated separately in the forward and re-

verse direction. Note that TCP ﬂows are usually ter-

minated upon connection teardown (by FIN packet)

while UDP ﬂows are terminated by a ﬂow timeout.

The ﬂow timeout value can be assigned arbitrarily

by the individual scheme e.g., 600 seconds for both

TCP and UDP in (Aghaei-Foroushani and Zincir-

(a) Scenario A VPN Precission and Recall

(b) Scenario A NON-VPN Precission and Recall

Figure 2: Scenario A-1: VPN detection.

Heywood, 2015). In this paper, we study several ﬂow

timeout (ftm) values with their corresponding classi-

ﬁer accuracy on the same dataset. In particular, we set

the duration of ﬂows to 15,30,60 and 120 seconds.

In our experiments, the classiﬁer has a response

time of (FT + FE + ML) seconds, where FT is the

customized ﬂow-time, FE is the feature extraction

time and ML is the machine learning algorithm

time to perform classiﬁcation. It has been ob-

served that the maximum accuracy is achieved with

(FT = 15s) for all the classiﬁers. In the current

implementation, we have found that the average

delay attained is approx. (FT + FE + ML =

15 + .001 + .01(kNN) or 1.26(C4.5) =

15.011 sec (kNN) or 16.261 sec (C4.5) ) for

the VPN classiﬁer and (FT + FE + ML =

15 + .001 + .01(kNN) or 1.49(C4.5) =

15.011 sec (kNN) or 16.491 sec (C4.5) ) for

the trafﬁc type classiﬁer.

As previously mentioned, we focus on time-

related features. When choosing time-related fea-

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

410

(a) ScenarioA VPN Precision (b) ScenarioA VPN Recall

(e) ScenarioB VPN Precision (f) ScenarioB VPN Recall

(g) ScenarioB Non-VPN Precision (h) ScenarioB Non-VPN Recall

Figure 3: Precision and Recall of trafﬁc characterization.

Characterization of Encrypted and VPN Trafﬁc using Time-related Features

411

tures, we consider two different approaches. In the

ﬁrst approach we measure the time, e.g. time be-

tween packets or the time that a ﬂow remains active.

In the second approach, we ﬁx the time and measure

other variables, e.g., bytes per second or packets per

second. In Table 2 we provide the complete list of

features extracted in this work. As one can see form

Table 2, except the duration, which shows the total

time of one ﬂow, there are six groups of features.

The ﬁrst three groups are namely: -ﬁat, -biat, and -

ﬂowiat, and are focused respectively on the forward,

backward and bi-directional ﬂows. The fourth and

ﬁfth groups of features, are calculated regarding to the

idle-to-active or active-to-idle states and are named -

idle and -active. Finally, the last group focuses on the

size and number of packets per second and is named

-psec feature.

4.2 Machine Learning Approaches

To execute the experiments we used Weka (Hall et al.,

2009), a well known tool that implements different

machine learning algorithms. We used its default

settings with 10 fold cross validation. Although Weka

includes many different algorithms for clustering

and classifying, regarding to the previous research

work and the human readability, we have selected

two algorithms from the supervised and unsupervised

families: C4.5 decision tree and KNN.

C4.5 Decision Tree: Developed by Ross Quinlan,

this algorithm is one of the most popular classi-

ﬁcation techniques in machine learning and data

mining. It is based on the concept of Information

entropy. The algorithm requires a set of training

pairs {inputs-output} where the output is the corre-

sponding class. Both numerical and categorical data

are supported, and the result is presented as a tree,

making it readable for humans.

KNN: The K-Nearest Neighbors algorithm is one of

the most simple algorithms in machine learning. It is

based on similarity measures, thus it depends on the

metric used to calculate the distance between exam-

ples. The output of the classiﬁcation is a class mem-

bership, which is determined according to the major-

ity vote of its K nearest neighbours.

To evaluate the quality of our classiﬁcation pro-

cesses, we will use two common metrics: Precision

(Pr) or Positive Predictive value and Recall (Rc) or

Sensitivity.

Pr =

T P

T P+FP

Rc =

T P

T P+FN

Where the TP is the number of instances correctly

classiﬁed as A, FP is the number of instances incor-

rectly classiﬁed as A, and FN is the number of in-

stances incorrectly classiﬁed as Not-A.

5 ANALYSIS OF THE RESULTS

In the Figures 2 and 3 we can see the Precision and

Recall of the different results. Overall C4.5 and KNN

had similar results, although C4.5 performed a little

better. But interestingly the results present a depen-

dance on the ﬂow-timeout value selected. Therefore

we have chosen to focus the attention on these result.

For each ﬂow tiemout value we have two different

representations (two lines) one of them corresponds

to the C4.5 result and the other one to the KNN.

5.1 Analysis of Scenario A

In the Figure 2 we have the Precision (Pr) and Recall

(Rc) results of the ﬁrst part of the scenario A, where

we classify trafﬁc into VPN and Non-VPN. We can

see that there is a direct relation between ﬂow timeout

(ftm) values and the performance of the classiﬁers. In

particular, the Precision (Pr) of the C4.5 VPN trafﬁc

classiﬁer decreases from 0.890 using 15 seconds to

0.86 using 120 seconds, and the Pr for Non-VPN traf-

ﬁc decreases from 0.906 to 0.887. We can see a simi-

lar behavior in the case of the KNN algorithm, where

the Pr for VPN trafﬁc decreases from 0.848 to 0.815,

and from 0.846 to 0.837 in the case of Non-VPN traf-

ﬁc. The best results are achieved using the C4.5 algo-

rithm and 15s ftm: 0.89 for VPN and 0.906 for Non-

VPN. This means that, using time-related features we

can distinguish VPN from Non-VPN with a 15s delay

(the time it takes to build a ﬂow). These results show

that when using time-related features for VPN and

Non-VPN trafﬁc classiﬁcation, using shorter timeout

values improve the accuracy rate.

The second part of scenario A focuses on the char-

acterization of VPN and Non-VPN trafﬁc (see Figure

3 parts a,b,c,d), separately. The input is classiﬁed ac-

cording to the trafﬁc categories deﬁned in Section 3.

Again, the results for shorter ftm values are better than

the results for larger values, although with a few ex-

ceptions in the case of the VPN classiﬁer (Figures 3a,

3b), like VPN-MAIL where the best result is obtained

with an ftm of 30s. In the case of the Non-VPN clas-

siﬁer (Figures 3c, 3d) this trend can be clearly seen.

The best results (average Pr) are obtained with

C4.5 and 15s of ftm: 0.84 and 0.89 for the VPN and

Non-VPN classiﬁers respectively. Moreover, the av-

erage Pr for all trafﬁc categories is higher than 0.84,

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

412

which means that time-related features are good clas-

siﬁers to characterize encrypted and VPN trafﬁc.

5.2 Analysis of Scenario B

In this Scenario all encrypted and VPN trafﬁc are

mixed together in one dataset, and the objective is

to characterize the trafﬁc without previously dividing

VPN from Non-VPN trafﬁc, therefore we will have

14 types of trafﬁc: 7 encrypted and 7 VPN trafﬁc

categories. The results are shown in Figure 3 (parts

e,f,g,h).

In this case, we cannot see the pattern ’shorter

timeout - better accuracy’ as clear as in the previ-

ous scenario (5.1). For example using the C4.5 al-

gorithm the Pr of VPN-Browsing, VPN-Mail, and

Mail with 15 sec is 0.771, 0.739, 0.671 respectively,

values lower than the 0.809, 0.786, 0.79 obtained

with 120 sec. The KNN results are similar, the Pr

of VPN-Browsing, VPN-Chat, and VPN-Mail traf-

ﬁc categories is (0.691, 0.501, 0.688) for 15s. ftm,

smaller than the Pr obtained with 120 sec (0.743,

0.501, 0.688). On the other hand, the highest aver-

age Pr from the different ftm values is around 0.783

for C4.5 and 0.711 for KNN algorithms, around 0.5

points lower that the best values from Scenario A.

6 CONCLUSIONS

In this paper we have studied the efﬁciency of time-

related features to address the challenging problem of

characterization of encrypted trafﬁc and detection of

VPN trafﬁc. We have proposed a set of time-related

features and two common machine learning algo-

rithms, C4.5 and KNN, as classiﬁcation techniques.

Our results prove that our proposed set of time-related

features are good classiﬁers, achieving accuracy lev-

els above 80%. C4.5 and KNN had a similar perfor-

mance in all experiments, although C4.5 has achieved

better results. From the two scenarios proposed, char-

acterization in 2 steps (scenario A) vs. characteri-

zation in one step (scenario B), the ﬁrst one gener-

ated better results. In addition to our main objective,

we have also found that our classiﬁers perform better

when the ﬂows are generated using shorter timeout

values, which contradicts the common assumption of

using 600s as timeout duration. As future work we

plan to expand our work to other applications and

types of encrypted trafﬁc, and to further study the

application of time-based features to characterize en-

crypted trafﬁc.

REFERENCES

Aceto, G., Dainotti, A., de Donato, W., and Pescape, A.

(2010). Portload: Taking the best of two worlds in

trafﬁc classiﬁcation. In IEEE Conference on Com-

puter Communications Workshops, INFOCOM 2010,

pages 1–5. IEEE.

Aghaei-Foroushani, V. and Zincir-Heywood, A. (2015). A

proxy identiﬁer based on patterns in trafﬁc ﬂows. In

IEEE 16th International Symposium on High Assur-

ance Systems Engineering, HASE 2015, pages 118–

125. IEEE.

Bernaille, L. and Teixeira, R. (2007). Early recognition

of encrypted applications. In Proceedings of the 8th

International Conference on Passive and Active Net-

work Measurement, PAM’07, pages 165–175, Berlin,

Heidelberg. Springer-Verlag.

Bernaille, L., Teixeira, R., Akodkenou, I., Soule, A., and

Salamatian, K. (2006). Trafﬁc classiﬁcation on the ﬂy.

ACM SIGCOMM Computer Communication Review,

36(2):23–26.

Callado, A., Kamienski, C., Szabo, G., Gero, B., Kelner, J.,

Fernandes, S., and Sadok, D. (2009). A survey on in-

ternet trafﬁc identiﬁcation. Communications Surveys

& Tutorials, IEEE, 11(3):37–52.

Coull, S. E. and Dyer, K. P. (2014). Trafﬁc analysis of

encrypted messaging services: Apple imessage and

beyond. ACM SIGCOMM Computer Communication

Review, 44(5):5–11.

omez Sena, G. and Belzarena, P. (2009). Early trafﬁc clas-

siﬁcation using support vector machines. In Proceed-

ings of the 5th International Latin American Network-

ing Conference, LANC ’09, pages 60–66, New York,

NY, USA. ACM.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann,

P., and Witten, I. H. (2009). The weka data mining

software: An update. ACM SIGKDD Explorations

Newsletter, 11(1):10–18.

Iliofotou, M., Pappu, P., Faloutsos, M., Mitzenmacher, M.,

Singh, S., and Varghese, G. (2007). Network moni-

toring using trafﬁc dispersion graphs (tdgs). In Pro-

ceedings of the 7th ACM SIGCOMM Conference on

Internet Measurement, IMC ’07, pages 315–320, New

York, NY, USA. ACM.

Karagiannis, T., Papagiannaki, K., and Faloutsos, M.

(2005). Blinc: Multilevel trafﬁc classiﬁcation in the

dark. In Proceedings of the 2005 Conference on

Applications, Technologies, Architectures, and Proto-

cols for Computer Communications, SIGCOMM ’05,

pages 229–240, New York, NY, USA. ACM.

Kim, H., Claffy, K., Fomenkov, M., Barman, D., Falout-

sos, M., and Lee, K. (2008). Internet trafﬁc classiﬁ-

cation demystiﬁed: Myths, caveats, and the best prac-

tices. In Proceedings of the 2008 ACM CoNEXT Con-

ference, CoNEXT ’08, pages 11:1–11:12, New York,

NY, USA. ACM.

Li, W., Canini, M., Moore, A. W., and Bolla, R. (2009). Ef-

ﬁcient application identiﬁcation and the temporal and

spatial stability of classiﬁcation schema. Computer

Networks: The International Journal of Computer and

Telecommunications Networking, 53(6):790–809.

Characterization of Encrypted and VPN Trafﬁc using Time-related Features

413

Mauro, M. D. and Longo, M. (2015). Revealing encrypted

webrtc trafﬁc via machine learning tools. In Proceed-

ings of the 12th International Conference on Security

and Cryptography, SECRYPT ’15, pages 259–266.

SciTePress.

McGregor, A., Hall, M., Lorier, P., and Brunskill, J. (2004).

Flow clustering using machine learning techniques.

In Passive and Active Network Measurement, volume

3015 of Lecture Notes in Computer Science, pages

205–214. Springer Berlin Heidelberg.

Mohammad S.I. Mamun, N. S. and Ghorbani, A. A. (2015).

An entropy-based encrypted trafﬁc classiﬁcation us-

ing machine learning. In Proceedings of the 17th

International Conference on Information and Com-

munication Security, ICICS 2015, Berlin, Heidelberg.

Springer-Verlag.

Moore, A. W. and Zuev, D. (2005). Internet trafﬁc classi-

ﬁcation using bayesian analysis techniques. In Pro-

ceedings of the 2005 ACM SIGMETRICS Interna-

tional Conference on Measurement and Modeling of

Computer Systems, SIGMETRICS ’05, pages 50–60,

New York, NY, USA. ACM.

Palmieri, F. and Fiore, U. (2009). A nonlinear, recurrence-

based approach to trafﬁc classiﬁcation. Computer Net-

works: The International Journal of Computer and

Telecommunications Networking, 53(6):761–773.

Paxson, V. (1994). Empirically derived analytic models of

wide-area tcp connections. IEEE/ACM Transactions

on Networking, 2(4):316–336.

Paxson, V. and Floyd, S. (1995). Wide area trafﬁc: The

failure of poisson modeling. IEEE/ACM Transactions

on Networking, 3(3):226–244.

Rao, A., Legout, A., Lim, Y.-s., Towsley, D., Barakat, C.,

and Dabbous, W. (2011). Network characteristics of

video streaming trafﬁc. In Proceedings of the Sev-

enth COnference on Emerging Networking EXperi-

ments and Technologies, CoNEXT ’11, pages 25:1–

25:12, New York, NY, USA. ACM.

Sherry, J., Lan, C., Popa, R. A., and Ratnasamy, S. (2015).

Blindbox: Deep packet inspection over encrypted traf-

ﬁc. In Proceedings of the 2015 ACM Conference on

Special Interest Group on Data Communication, SIG-

COMM ’15, pages 213–226, New York, NY, USA.

ACM.

Wang, D., Zhang, L., Yuan, Z., Xue, Y., and Dong,

Y. (2014). Characterizing application behaviors for

classifying p2p trafﬁc. In International Confer-

ence on Computing, Networking and Communica-

tions, ICNC’14, pages 21–25. IEEE.

Williams, N., Zander, S., and Armitage, G. (2006). A

preliminary performance comparison of ﬁve machine

learning algorithms for practical ip trafﬁc ﬂow classi-

ﬁcation. ACM SIGCOMM Computer Communication

Review, 36(5):5–16.

Yeganeh, S., Eftekhar, M., Ganjali, Y., Keralapura, R., and

Nucci, A. (2012). Cute: Trafﬁc classiﬁcation using

terms. In 21st International Conference on Computer

Communications and Networks, ICCCN ’12, pages 1–

9. IEEE.

Zander, S., Nguyen, T., and Armitage, G. (2005). Auto-

mated trafﬁc classiﬁcation and application identiﬁca-

tion using machine learning. In Proceedings of the The

IEEE Conference on Local Computer Networks 30th

Anniversary, LCN ’05, pages 250–257, Washington,

DC, USA. IEEE Computer Society.

ICISSP 2016 - 2nd International Conference on Information Systems Security and Privacy

414