Forecasting Internet Traffic by Neural Networks
Under Univariate and Multivariate Strategies
Paulo Cortez
1
, Miguel Rio
2
, Pedro Sousa
3
and Miguel Rocha
3
1
Department of Information Systems/R&D Algoritmi Centre
University of Minho, 4800-058 Guimar˜aes, Portugal
2
Department of Electronic and Electrical Engineering, University College London
Torrington Place, WC1E 7JE, London, U.K.
3
Department of Informatics, University of Minho, 4710-059 Braga, Portugal
Abstract. By improving Internet traffic forecasting, more efficient TCP/IP traf-
fic control and anomaly detection tools can be developed, leading to economic
gains due to better resource management. In this paper, Neural Networks (NNs)
are used to predict TCP/IP trafc for 39 links of the UK education and research
network, under univariate and multivariate strategies. The former uses only past
values of the forecasted link, while the latter also uses the traffic from neigh-
bor links of the network topology. Several experiments were held by considering
hourly real-world data. The Holt-Winters method was also tested in the com-
parison. Overall, the univariate NN approach produces the best forecasts for the
backbone links, while a Dijkstra based NN multivariate strategy is the best option
for the core to subnetwork links.
1 Introduction
Internet traffic prediction is a key issue for understanding communication networks and
optimizing resources (e.g. adaptive congestion control and proactive network manage-
ment), allowing a better quality of service [1–3]. Moreover, traffic forecasting can help
to detect anomalies (e.g. security attacks, viruses or an irregular amount of SPAM) by
comparing the real traffic with the forecasts [4,5].
TCP/IP traffic prediction is often done intuitively by network administrators, with
the help of marketing information (e.g. future number of costumers) [1]. Yet, this may
not be suited for serious day-to-day network administration and the alternative is to
use Operational Research and Computer Science methods. In particular, the field of
Time Series Forecasting (TSF), deals with the prediction of a chronologically ordered
variable, where the goal is to model a complex system as a black-box, predicting its
behavior based in historical data [6]. The TSF approaches can be divided into univariate
and multivariate, depending if one or more variables are used. Multivariate methods are
likely to produce better results, provided that the variables are correlated [7].
Cortez P., Rio M., Sousa P. and Rocha M. (2008).
Forecasting Internet Traffic by Neural Networks Under Univariate and Multivariate Strategies.
In Proceedings of the 4th International Workshop on Artificial Neural Networks and Intelligent Information Processing, pages 61-70
DOI: 10.5220/0001508600610070
Copyright
c
SciTePress
Several TSF methods have been proposed, such as the Holt-Winters [6] and Neural
Networks (NN) [8,3]. Holt-Winters was developed for series with trended and seasonal
factors and more recently a double seasonal version has been proposed [9]. In contrast
with the conventional TSF methods (e.g. Holt-Winters), NNs can predict nonlinear se-
ries. In the past, several studies have proved the predictability of network traffic by using
similar methods. For instance, the Holt-Winters was used in [4, 10] and NNs have also
been proposed [11,5, 3]. However, these studies only considered univariate (or single
link) data, thus not making use of the topology network. By using data from more than
one link, there is a potential for better predictions.
This study will use recent hourly data from the United Kingdom Education and
Research Network (UKERNA) network. The network includes a backbone made up of
8 core routers that transport data through 21 regional subnetworks. In this paper, we
will explore NNs and two multivariate approaches for data selection: using all direct
neighbor links and selecting the most probable neighbor that is expected to influence
the predicted link. The latter strategy is based in a novel heuristic that uses the Open
Shortest Path First (OSPF) [12] protocol and Dijkstra algorithm. These approaches will
be compared with the NN univariate case and also the classic Holt-Winters method.
Furthermore, we will predict all UKERNA core to core and core to subnetwork links,
in a total of 39 connections.
2 Internet Traffic Data
This work will analyze traffic data (in Mbit/s) from the UK academic network backbone
(UKERNA)
4
, which includes eight core routers and 21 subregional networks. Figure 1
plots the respective direct graph, where bdd, sdd and cdd denote the links within the
backbone core routers, core to subnetwork and subnetwork to core, respectively (d is
a digit number). The data collection was based in the Simple Network Management
Protocol (SNMP), which quantifies the traffic passing through every network interface
with reasonable accuracy [13]. SNMP is widely deployed by every Internet Service
Provider/network and the collection of this data does not induce any extra traffic on the
network. In this work, we will adopt an hourly scale, denoting a short-term forecasting
that is often used to for optimal control or detection of abnormal situations [14]. The
data was recorded from 12 AM of 14th June 2006 to 12 AM of 23th July 2006. In total,
there are 936 hourly observations for each link.
The OSPF is the the most commonlyused intra-domainroutingprotocol[12]. Under
this protocol, every link contains a weight that is assigned by the network administrator.
The Dijkstra algorithm is used to find the shortest paths between any two nodes of
the network and these paths are then used by the routers to direct traffic. Most of the
UKERNA OSPF weights are set to 10 and the few exceptions are listed in Figure 1. For
instance, the OSPF weight between the core routers of Glasgow and Edinburgh is 100
(links b09 and b18); and the shortest path between Warrington and Edinburgh includes
the links b07 and b18.
4
http://www.ja.net
62
core router
CoshamBristol
Reading
Leeds
EdinburghGlasgow
Clydenet
NIRAN
FatMAN
Warrington
NWMAN
C&NLMAN
NNW
UHI
MidMAN
TVN
SWMAN
SWERN1
SWERN2
AbMAN
NorMAN
YMMAN
b18
b01
b02
b04
b05
100
100
100 111
100
100
100
112
100
b03
b06
b07
b08
b09
b10
b11
b12
b13
b14
b15
b16
b17
EastMAN
LeNSE
s17
s18
c18
s15
c17
c16
s16
c15
c10
s10
s11 c11
s20
c20
s13
s02
c02
c13
s19
s03
c03
150
c12
c19
c07
s12
s07
s04
subnetwork
c04
c01
s01
s14
c21
s21
c14
EMMAN
EastNET
KentMAN
s06
c06
s05
c05
c08
s08
London
LMN
c09
s09
100
backbone link
Fig.1. The schematic of the UK academic Internet network.
As an example, the traffic of two neighbor links, Warrington-Glasgow (b07) and
Glasgow-Clydenet (s05), is plotted in Figure 2. In both graphs, there are influences of
two seasonal components due to the the intraday and intraweek cycles.
weekly cycle
daily cycle
0
100
200
300
400
500
600
700
800
0 100 200 300 400 500 600 700 800 900
Traffic (Mbits/s)
Time (hours)
daily cycle
weekly cycle
10
20
30
40
50
60
70
80
90
100
110
0 100 200 300 400 500 600 700 800 900
Traffic (Mbits/s)
Time (hours)
Fig.2. The IP traffic for b07 (Warrington-Glasgow, left) and s03 (Glasgow-Clydenet, right) links.
63
3 Forecasting Methods
A Time Series Forecasting (TSF) model assumes that past patterns will occur in the
future. Let y
t
= (y
1t
, . . . , y
kt
) denote a multivariate series, where y
ij
is the jth chrono-
logical observation on variable i and k is the number of distinct time variables (k = 1
when a univariate setting is used). Then [7]:
by
pt
= F (y
1t1
, . . . , y
1tn
, . . . , y
kt1
, . . . , y
ktn
)
e
pt
= y
pt
by
pt
(1)
where by
pt
denotes the estimated value for the pth variable and time t; F the underlying
function of the forecasting model; and e
pt
is the error (or residual).
The overall performance of a model is evaluated by a global accuracy measure,
namely the Root Mean Squared Error (RMSE) and Relative RMSE (RRMSE), given in
the form [15]:
RM SE
p
=
q
P
P +N
i=P +1
e
2
pi
/N
RRM SE
p
= RMSE
p
/RM SE
y
pt
× 100 (%)
(2)
where P is the presenttime; N is the number of forecasts; and RMSE
y
pt
is the RMSE
given by the simple mean prediction. The last metric (RRMSE) will be adopted in this
work, since it has the advantage of being scale independent, where 100% denotes an
error similar to the mean predictor (y
pt
).
Due to the temporal nature of this domain, a sequential holdout will be adopted for
the forecasting evaluation. Hence, the first T R = 2/3 of the series will be used to fit
(train) the forecasting models and the remaining last 1/3 to evaluate (test) the forecast-
ing accuracies. Also, an internal holdout procedure will be used for model selection,
where the training data will be further divided into training (2/3 of T R) and validation
sets (1/3 of T R). The former will be used to fit the candidate models, while the latter
will be used to select the models with the lowest error (RMSE). After this selection
phase, the final model is readjusted using all training data.
3.1 Neural Networks
Neural Networks (NNs) are innate candidates for forecasting due to their nonlinear and
noise tolerance capabilities. Indeed, the use of NNs for TSF began in the late eighties
with encouraging results and the field has been growing since [8,14, 11,3].
The multilayer perceptron is the most popular NN used within the forecasting do-
main [8, 11]. When adopting this architecture, TSF is achieved by using a sliding time
window
5
. A sliding window is defined by the set of time lags used to build a forecast.
For instance, given the univariate time series 1,2,3,4,5,6 and sliding window {1, 2, 4},
the following training examples can be built: 1, 3, 4 5 and 2, 4, 5 6. In a multi-
variate setting, k sliding windows are used: {L
11
, . . . , L
1W
1
}, . . . , {L
k1
, . . . , L
kW
k
},
where L
ij
denotes a time lag for the ith variable.
In this work, a fully connected multilayer network with one hidden layer of H
hidden nodes and bias connections will be adopted (Figure 3). The logistic activation
5
This combination is also named Time Lagged Feedforward Network (TLFN) in the literature.
64
...
Output
...
pt
...
...
y
y
1t
y
y
1t−L
...
...
1t−L
y
kt−L
y
kt−L
y
kt
...
Series
Multivariate
Layer
Input
Layer
Hidden
...
+1
+1
+1
i
w
i,j
j
11
1W
1
window 1
window k
time
time k1
kW
k
Fig.3. The multilayer perceptron architecture for multivariate time series forecasting.
function is applied on the hidden nodes and the output node uses a linear function [16].
In past work [3], this architecture outperformed conventional univariate methods such
as Holt-Winters and ARMA models. The overall model is given in the form:
by
pt
= w
o,0
+
P
I+H
i=I+1
f(
P
k
s=1
P
W
s
r=1
y
stL
sr
w
i,j
)
(3)
where w
d,s
is the weight from node s to d; (if d = 0 then it is a bias connection); j
{1, . . . , I} is an input node; o is the output node; and f the logistic function (
1
1+e
x
).
Before training, all variables are scaled with a zero mean and one standard devia-
tion. Then, the initial NN weights are randomly set within [0.7, +0.7] [17]. Next, the
training algorithm is applied and stopped when the error slope approaches zero or after
a maximum of E epochs. Since the NN cost function is nonconvex(with multiple min-
ima), NR runs are applied to each neural setup, being selected the NN with the lowest
mean error [16]. After training, the NN outputs are rescaled to the original domain.
Under this setting, the NN performance will depend on the number of hidden nodes
(H), the selection of the k variables used in the multivariate model and the time window
used for each variable. All these parameters can have a crucial effect in the forecasting
performance. Feeding a NN with uncorrelated variables or time lags may affect the
learning process due to the increase of noise. A NN with 0 hidden neurons can only
learn linear relationships and it is equivalent to the classic Auto-Regressive(AR) model.
By increasing the number of hidden neurons, more complex nonlinear functions can be
learned but also it increases the probabilityof overfitting to the data and thus loosing the
generalization capability. Since the search space for these parameters is high, heuristic
procedures will be used during the model selection step.
Three strategies are proposed for the variable selection:
Single-Link NN (SLNN), the simple univariate model where the predictions are
based on the past values of the current link (p);
All Direct Neighbor Link NN (ADNN), based on p plus the previous traffic ob-
served in all direct neighbor links that influence p; and
Dijkstra-Assisted NN (DANN), based on p plus the neighbor that is expected to
influence more the predicted link under the OSPF protocol. First, the Dijkstra algo-
65
rithm is used to compute the shortest OSPF paths between all nodes of the network.
Then, the subset with all paths that include p as an internal or end link is selected.
Finally, the heuristic selects the most common
6
direct preceeding neighbor of p in
the subset.
Regarding the multivariate methods, DANN selects only k = 2 variables, while ADNN
uses a higher numberof links (from3 to 7). For instance, when forecasting the Reading-
TVN (p=s18) traffic, the ADNN variable set is {s18,b05,b01}
7
(Figure 1). There are 16
OSPF paths ending at TVN that include b05 and only 11 paths that go through b01.
Hence, DANN will select the former (i.e. {s18, b05}).
Based on previous univariate IP traffic forecasting work [3], a small range of hidden
nodes will be tested, with H {0, 2, 4, 6}. Also, three sliding windows, based on the
daily (K
1
= 24) and weekly (K
2
= 168) cycles, will be considered: w
1
= {1, 24, 25},
w
2
= {1, 168, 169} and w
3
= {1, 24, 25, 168, 169}. In [3], this sliding window setup
obtained high quality results. When a multivariate model is used, then the same window
is applied to all links.
3.2 Holt-Winters Methods
The Holt-Winters (HW) [6] is a popular univariate forecasting technique from the fam-
ily of Exponential Smoothing methods. The predictive model is based on some under-
lying patterns such as a trend or a seasonal cycle (K
1
), which are distinguished from
random noise by averaging the historical values. Its popularity is due to advantages
such as the simplicity of use, the reduced computational demand and the accuracy of
the forecasts, specially with seasonal series.
The general model is defined by:
Level S
t
= α
y
t
D
tK
1
+ (1 α)(S
t1
+ T
t1
)
Trend T
t
= β(S
t
S
t1
) + (1 β)T
t1
Seasonality D
t
= γ
y
t
S
t
+ (1 γ)D
tK
1
by
pt
= (S
t1
+ T
t1
) × D
tK
1
(4)
where S
t
, T
t
and D
t
stand for the level, trend and seasonal estimates, K
1
for the sea-
sonal period, and α, β and γ for the model parameters. When there is no seasonal
component, the γ is discarded and the D
tK
1
factor in the last equation is replaced
by the unity. More recently, this method has been extended to encompass two seasonal
cycles (K
1
and K
2
) [9]. In this work, four HW variants will be tested in the model
selection phase: n – non seasonal (K
1
= 1); d – daily seasonal (K
1
= 24); w – weekly
seasonal (K
1
= 168); and D – double seasonal (K
1
= 24 and K
2
= 168).
4 Experiments and Results
The experiments were conducted off-line (i.e. after the data was collected) using the
RMiner [18],an open source library for theR statistical environment[19]. Inparticular,
6
In case of a draw (which rarely occurs), the heuristic simply selects one of the contenders.
7
The link c18 is not considered, since its origin (TVN) matches the link destination.
66
the RMiner uses the nnet package [17] to implement the NNs. The NNs were trained
with E = 100 epochs of the BFGS algorithm [20], from the family of quasi-Newton
methods and the number of runs was set to NR = 10. The HW initial values (e.g. level
estimate) were set by averaging the early observations [9] and the internal parameters
(e.g. α) were optimized using a 0.05 grid search for the best training error (RM SE).
Since the intention is to compared univariate and multivariate approaches, only the
links with preceding neighbors will be predicted, i.e., all sdd and bdd, in a total of 39
connections. The selected forecasting models for each method are shown in Table 1.
For the HW, the weekly cycle is the most common model (w) and the double seasonal
variant is never used. The weekly effect (w
2
) is also the most common case for strategy
ADNN. In contrast, the majority of the SLNN and DANN methods use the double
seasonal model (w
3
). Regarding the NN architectures, in general only linear models
are selected. The 15 nonlinear exceptions (H > 0) are listed in the table. These results
confirm the notion that short term IP traffic can be modeled by small networks.
The forecasts with the selected models were performed on the test sets (with 312
elements) for all links shown in Table 1. Thirty runs were applied for the NNs and the
results are shown as the mean RRMSE with the respective 95% t-student confidence
intervals. The range of the best RRMSE values is high, showing that some links are
much harder to predict than others (e.g. 14.1% for b10 versus 82.5% for s02). Overall,
the HW is the worst strategy, since it is the best method for only 2 links (b05 and b14).
Regarding the backbone links, the univariate approach (SLNN) is the best NN choice
in 10 cases, followed by the ADNN (best in 7 links), while the DANN outperforms the
other methods for only one link (b17). This scenario changes when considering the core
to subnetwork links (sdd), where the DANN is the best method (with 12 wins), while
both SLNN and ADNN achieve statistically significant lowest errors in 4 cases.
For demonstrative purposes, the left of Figure 4 presents the average DANN traffic
forecasts for the first 60 hours ofthe s03 series. In this case, a high quality fit is achieved,
since the two curves are close. The observed (x-axis) versus the predicted values for
a given run (y-axis) is also shown. In the figure, the forecasts (points) are near the
diagonal line, which denotes the perfect forecast. Another relevant issue is related with
the computational complexity. The proposed solution is very fast and can be used in
real-time. For the example, with a Pentium Dual Core 3GHz processor, the DANN
model selection phase took 12 seconds, while the 30 runs of the final NN training and
testing required only 2.2 seconds.
5 Conclusions
This work analyses the efficiency of several Neural Network (NN) approaches when
applied to predict hourly TCP/IP traffic, collected from the United Kingdom Education
and Research Network (UKERNA). In particular, three strategies were tested: SLNN –
univariate approach based on past patterns from the current link; ADNN which also
includes the past values from all direct neighbors; and DANN – a novel approach that
includes only one link neighbor, whose selection is based on the Dijkstra algorithm and
OSPF protocol. Also, a comparison was made with the Holt-Winters (HW) method,
which is popular for seasonal series.
67
Table 1. The forecasting RRM SE errors and selected models (in brackets).
Link SLNN ADNN DANN HW
b01 24.8±0.0 (w
3
) 23.9±0.0 (w
2
) 25.0±0.0 (w
3
) 25.2 (w)
b02 63.2±0.0 (w
1
) 89.5±0.0 (w
2
) 63.4±0.0 (w
1
) 68.7 (n)
b03 22.1±0.0 (w
3
) 22.1±0.0 (w
2
) 22.2±0.0 (w
3
) 27.8 (d)
b04 21.3±0.0 (w
3
) 21.5±0.0 (w
2
) 22.2±0.0 (w
2
) 25.2 (w)
b05 34.1±0.0 (w
3
) 34.7±0.0 (w
2
) 35.7±0.0 (w
2
) 34.0 (w)
b06 86.6±2.7 (w
1
, H=4) 58.1± 0.0 (w
1
) 58.4±0.0 (w
3
) 69.0 (w)
b07 19.7±0.0 (w
3
) 30.6±0.0 (w
3
) 20.3±0.0 (w
3
) 25.1 (w)
b08 40.7±0.0 (w
2
) 40.7±0.0 (w
2
) 41.2±0.0 (w
2
) 44.3 (w)
b09 56.5±0.0 (w
2
) 57.2±0.0 (w
2
) 57.5±0.0 (w
2
) 67.5 (w)
b10 14.1±0.0 (w
3
) 17.7±0.0 (w
2
) 15.4±0.0 (w
3
) 15.0 (w)
b11 54.1±0.0 (w
3
) 57.3±0.9 (w
3
, H=2) 54.3±0.0 (w
2
) 58.0 (n)
b12 62.7±5.9 (w
2
, H=2) 36.1± 0.0 (w
1
) 74.6±0.0 (w
3
) 45.4 (w)
b13 30.5±0.0 (w
3
) 31.2±0.0 (w
2
) 30.6±0.0 (w
3
) 31.7 (w)
b14 19.5±0.0 (w
3
) 19.4±0.0 (w
3
) 19.5±0.0 (w
3
) 19.0 (w)
b15 79.9±0.0 (w
3
) 78.7±0.0 (w
2
) 80.4±0.0 (w
3
) 87.0 (n)
b16 48.0±1.0 (w
2
, H=4) 37.5± 0.0 (w
2
) 38.7±0.0 (w
3
) 39.4 (w)
b17 31.5±0.8 (w
3
, H=2) 57.5±3.1 (w
1
) 28.3±0.0 (w
3
) 30.1 (w)
b18 57.3±0.0 (w
3
) 59.2±0.0 (w
2
, H=6) 58.4±0.0 (w
3
) 80.8 (w)
s01 42.3±0.4 (w
2
, H=2) 47.5±2.7 (w
2
, H=2) 41.8±0.0 (w
2
) 45.1 (w)
s02 82.8±0.0 (w
3
) 85.2±0.0 (w
2
) 82.5±0.0 (w
3
) 91.9 (w)
s03 33.6±0.0 (w
3
) 34.6±0.0 (w
3
) 32.4±0.0 (w
3
) 37.3 (d)
s04 41.3±0.1 (w
2
, H=2) 41.4±0.0 (w
2
) 41.8±0.0 (w
2
) 48.0 (w)
s05 41.4±0.0 (w
3
) 41.6±0.0 (w
1
) 42.0±0.0 (w
3
) 47.5 (w)
s06 39.6±0.0 (w
3
) 38.3±0.0 (w
2
) 38.2±0.0 (w
3
) 44.1 (w)
s07 45.1±0.0 (w
2
) 42.8±0.0 (w
2
) 42.9±0.0 (w
2
) 51.7 (w)
s08 27.5±0.0 (w
3
) 28.9±0.0 (w
2
) 28.3±0.0 (w
3
) 34.9 (w)
s09 28.6±0.0 (w
3
) 27.3±0.0 (w
2
) 28.3±0.0 (w
3
) 36.5 (w)
s10 35.9±0.5 (w
2
, H=6) 32.8± 0.0 (w
3
) 38.6±0.0 (w
3
) 33.2 (w)
s11 68.4±0.0 (w
3
) 69.6±0.0 (w
1
) 71.2±1.2 (w
2
, H=2) 74.5 (n)
s12 71.8±8.3 (w
3
, H=4) 69.0±0.0 (w
2
) 56.7±0.0 (w
3
) 65.4 (w)
s13 48.1±0.0 (w
2
) 44.7±0.0 (w
2
) 44.7±0.0 (w
2
) 47.7 (w)
s14 36.8±0.0 (w
3
) 43.8±0.0 (w
1
) 34.0±0.0 (w
2
) 37.9 (w)
s15 26.5±0.0 (w
3
) 24.9±0.0 (w
2
) 23.6±0.0 (w
3
) 27.2 (w)
s16 33.4±0.0 (w
3
) 33.8±0.0 (w
2
) 32.3±0.0 (w
3
) 36.0 (d)
s17 28.3±0.5 (w
2
, H=2) 26.7±0.0 (w
3
) 25.5±0.0 (w
3
) 32.2 (n)
s18 54.2±0.0 (w
2
) 53.2±0.0 (w
3
) 51.9±0.0 (w
3
) 54.7 (n)
s19 39.8±0.0 (w
2
) 41.1±0.0 (w
3
) 40.1±0.0 (w
2
) 39.9 (w)
s20 64.7±0.2 (w
3
, H=2) 65.4±0.0 (w
2
) 61.3±0.0 (w
3
) 70.7 (d)
s21 34.2±0.3 (w
3
, H=2) 40.6±0.0 (w
3
) 32.6±0.0 (w
3
) 32.9 (w)
bold – statistical significance under a pairwise comparison with other NN methods.
underline – best model.
A large number of experiments was conducted, with a total of 39 forecasted links.
Overall, the NN results are quite competitive, outperforming the HW model in all ex-
cept two cases. Regarding the univariate versus multivariate comparison, the results
68
20
25
30
35
40
45
50
55
60
10 20 30 40 50 60
Traffic (Mbits/s)
Time (x 1 hour)
s03 values
DANN forecasts
10 20 30 40 50 60
10 20 30 40 50 60
Fig.4. Example of the forecasts (left) and observed versus predicted values scatter plot (right).
differ according to the link characteristics. Within the backbone links, the SLNN is the
best option in 10 of the 18 series, while DANN only excels other strategies for one con-
nection (b17 of Figure 1). However, for the core to subnetwork links, the multivariate
DANN strategy provides the best forecasts in 12 of 21 cases, while SLNN achieves the
best performance in only 4 links. These results may be explained by the nature of the
network topology. The core to subnetwork links are peripheral funnels, thus they are
more likely to be influenced by one single neighbor. In contrast, the core routers are
large carriers, i.e., they direct traffic from/to a larger number of nodes.
Since small networkswere selected,the NNs arevery fast and can be applied in real-
time. Thus, the proposed approach opens room for producing better traffic engineering
tools and methods to detect anomalies in the traffic patterns. This can be achieved with-
out producing any extra traffic in the network and with minimal use of computation
resources, since this work was designed assuming a passive monitoring system.
In future work, the comparison will be extended to other forecasting techniques
(e.g. ARMA models [21]). Moreover, the proposed approach will be applied to traffic
demands of specific Internet applications, such as Voice over Internet Protocol (VoIP).
Another promising direction is to explore incomplete information scenarios. For in-
stance, to see if it is possible to forecast the backbone link traffic using only the subnet-
work to core connections, i.e., without knowing the past values of the predicted links.
Acknowledgements
This work is supported by the FCT project PTDC/EIA/64541/2006. We would also like
to thank Steve Williams from UKERNA for providing us with part of the data used in
this work.
69
References
1. K. Papagiannaki, N. Taft, Z. Zhang, and C. Diot. Long-Term Forecasting of Internet Back-
bone Traffic. IEEE Trans. on Neural Networks, 16(5):1110–1124, September 2005.
2. V. Alarcon-Aquino and J. Barria. Multiresolution FIR Neural-Network-Based Learning Al-
gorithm Applied to Network Traffic Prediction. IEEE Trans. on Systems, Man and Cyber-
netics - Part C, 36(2):208–220, 2006.
3. P. Cortez, M. Rio, M. Rocha, and P. Sousa. Internet Traffic Forecasting using Neural Net-
works. In Proceedings of the IEEE 2006 International Joint Conference on Neural Networks,
pages 4942–4949, Vancouver, Canada, 2006.
4. B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen. Sketch-based Change Detection: Meth-
ods, Evaluation, and Applications. In Proc. of Internet Measurment Conference (IMC’03),
Miami, USA, October 2003. ACM.
5. J. Jiang and S. Papavassiliou. Detecting Network Attacks in the Internet via Statistical Net-
work Traffic Normality Prediction. Journal of Network and Systems Management, 12:51–72,
2004.
6. S. Makridakis, S. Weelwright, and R. Hyndman. Forecasting: Methods and Applications.
John Wiley & Sons, New York, USA, 1998.
7. G. Reinsel. Elements of Multivariate Time Series Analysis. Springer, San Francisco, CA,
second edition, 2003.
8. A. Lapedes and R. Farber. Non-Linear Signal Processing Using Neural Networks: Prediction
and System Modelling. Tech. Rep. LA-UR-87-2662, Los Alamos National Laboratory, USA,
1987.
9. J. Taylor, L. Menezes, and P. McSharry. A Comparison of Univariate Methods for Fore-
casting Electricity Demand Up to a Day Ahead. Int. Journal of Forecasting, 21(1):1–16,
2006.
10. Q. He, C. Dovrolis, and M. Ammar. On the Predictability of Large Transfer TCPThroughput.
In Proc. of SIGCOMM’05, Philadelphia, USA, August 2005. ACM.
11. H. Tong, C. Li, and J. He. Boosting Feed-Forward Neural Network for Internet Traffic
Prediction. In Proc. of the IEEE 3rd Int. Conf. on Machine Learning and Cybernetics, pages
3129–3134, Shanghai, China, August 2004.
12. T.M. ThomasII. OSPF Network Design Solutions. Cisco Press, 1998.
13. W. Stallings. SNMP, SNMPv2, SNMPv3 and RMON 1 and 2. Addison Wesley, 1999.
14. X. Ding, S. Canu, and T. Denoeux. Neural network based models for forecasting. In Proc.
of Applied Decision Technologies Conf. (ADT’95), pages 243–252, Uxbridge, UK, 1995.
15. I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques
with Java Implementations. Morgan Kaufmann, San Francisco, CA, second edition, 2005.
16. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining,
Inference, and Prediction. Springer-Verlag, NY, USA, 2001.
17. W. Venables and B. Ripley. Modern Applied Statistics with S. Springer, 4th edition, 2003.
18. P. Cortez. RMiner: Data Mining with Neural Networks and Support Vector Machines using
R. In R. Rajesh (Ed.), Introduction to Advanced Scientific Softwares and Toolboxes, IAEng
publishers, Singapore, In Press.
19. R Development Core Team. R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria, 2007. ISBN 3-900051-00-3.
20. M. Moller. A scaled conjugate gradient algorithm for fast supervised learning. Neural
Networks, 6(4):525–533, 1993.
21. G. Box and G. Jenkins. Time Series Analysis: Forecasting and Control. Holden Day, USA,
1976.
70