for the IP datagram to be processed does not exist,
it needs to be established using the Internet Key Ex-
change protocol (IKE) (Harkins and Carrell, 1998).
IPSec is often used to create Virtual Private Net-
works (VPNs). A VPN is an extension of a pri-
vate network on a public network (e.g., the Internet)
(Feghhi and Feghhi, 2001) (Yuan and Strayer, 2001).
The extended part of the network logically behaves
like a private one. Typical usage scenarios for VPNs
are: remote user access to a private LAN over the
Internet and connection of two private networks. In
these cases a virtual secure channel needs to be cre-
ated, respectively, from the user’s PC to the LAN pub-
lic access point or from one LAN to the other. Private
network public access points are called secure gate-
ways. A secure gateway is a router or a router/firewall
also running a VPN-enabled software (e.g., an IPSec
implementation). All the traffic within the LAN is
usually not protected, while the traffic going out or
coming in the LAN through the secure gateway is pro-
tected by some security mechanisms.
IPSec has proved to be computationally very in-
tensive (Miltchev et al., 2002) (Ariga et al., 2000)
(Alberto Ferrante et al., 2005b). Thus some hardware
acceleration is needed to support large network band-
widths, as may be required even in small secure gate-
ways. Cryptography is often believed to be the only
part of the IPSec suite that requires a large amount of
resources. In the reality, IPSec implementations also
require to perform other operations, such as header
processing and IPSec database querying. The lat-
ter may become a bottleneck for the system as it re-
quires to be done at least once for each IP packet that
is traversing the system. In fact, the SPD needs to
be queried for each IP packet, the SAD needs to be
queried only when IP packets are determined to re-
quire some IPSec processing. Considering an overall
traffic of 1Gbit/s, and the worst possible case (i.e. the
packets are received at the maximum possible rate and
their size is the smallest possible one, that is 40bytes),
the SPD needs to be queried 3, 355, 443 times per sec-
ond. On average, queries are usually fewer then one
million per second in a normal system operating at the
same speed. In any case, an efficient database query
unit is therefore vital to achieve high performance
In this paper we present a study about a database
query unit for the SAD and the SPD databases; in the
best configuration this unit is able to perform 11 mil-
lion of queries per second. Section 2 describes the
different possible architectural solutions and the dif-
ferent techniques that can be adopted for the database
query. This database unit has been taught to be used
in IPSec accelerators such as the one shown in (Al-
berto Ferrante and Vincenzo Piuri, 2007). Section 3
presents the model for the simulations and the ob-
tained results. Section 4 shows an improvement of
the proposed architecture and the related simulations
and results. Section 5 shows a study of the optimal
architecture when an area–delay cost function is con-
sidered.
2 SYSTEM ARCHITECTURE
AND DATABASE QUERY
TECHNIQUES
As shown in Figure 1, the main databases are stored
in an off-chip memory and are accessed through on-
chip caches, one for the SAD and the other one for
the SPD. This structure remains the same for both
hardware and software implementations of the query
unit. An off-chip memory for the databases provides
flexibility at the cost of diminishing performance. In
fact, an external memory provides ease expandability;
on the opposite, an internal memory delivers perfor-
mance that cannot be reached by external ones. The
main goal of the cache is to mask the access to the
external memory thereby reducing the access time. In
our case the total query time is not only given by the
physical access time of the external memory, but also
by the lookup time of the records that are stored in it.
The two caches are implemented as two Content
Addressable Memories (CAMs) (Kostas Pagiamtzis,
nd) (Pagiamtzis and Sheikholeslami, 2003). With this
kind of memory cells can be addressed by a part of
their contents. Therefore, they provide a good way
to implement lookup tables. For this reason they are
often used in routers and network processors (see, for
example, (Prashant Chandra et al., 2003)). The two
databases can be implemented in external memory ei-
ther in shared or unshared fashion; even if the mem-
ory is physically shared, the databases should be con-
sidered as logically separated.
When a new packet arrives, the database is first
queried in the cache; if a cache miss occurs, then a
query is performed in the main memory. Hence, the
worst case search time for a record is the sum of the
time required to perform a query in cache and the time
to perform a query in the main database. The best
case search time is defined as the time to do a query
in the cache. Depending on the implementation of the
database query unit, the memory may need more than
one port. Later in this section we discuss different
methods to query the databases and different cache
replacement techniques.
SECRYPT 2007 - International Conference on Security and Cryptography
134