data, or transaction logs data.
Transparency: This ensures that encryption has
no impact on database schemas and user
applications. In fact, encryption can be thought
of as invisible to them.
Performance: This ensures that data stored in the
database buffer cache remains in clear text.
Consequently, encryption imposes no restrictions
on the database system when it comes to
selecting the most efficient plan to execute a
query (e.g., queries with range predicates).
Compression: Database systems implement
compression techniques to reduce the size of the
data stored on disk. Typically, these techniques
look for repeating patterns in order to avoid
storing all copies of such patterns. Encryption,
by definition, removes all patterns. This means
that the order in which compression and
encryption are performed is important. For
example, if encryption is performed first, then
the compression rate will be zero as encryption
will leave no patterns. Thus, placing our
encryption run-time processing just above the
database I/O layer ensures that encryption and
compression can coexist in harmony.
5.2.2 Encryption Run-time Processing
The encryption run-time processing consists of two
functions: Encryption and decryption. Encryption
takes place when the database system is writing data
out to the storage system. Decryption happens when
the database system is reading data in from the
storage system.
While the solution can easily support any
symmetric block cipher for encryption/decryption,
we have chosen to implement support for only AES
and 3DES as they are the most commonly used
block ciphers. AES is actually the standard
symmetric block cipher. Block ciphers support many
modes of operations. Electronic Code Book (ECB)
is the easiest mode to implement but is also the
weakest from a security perspective. This is because
in ECB mode the same clear text input will always
result in the same cipher text. This may be fine for
encrypting small pieces of data such as a password,
but not for database encryption as this will introduce
patterns and may compromise the encryption
solution. Instead, we have chosen to use the Cipher
Block Chaining (CBC) mode as it does not introduce
patterns. This means we need to provide an
initialization vector when calling the block cipher in
CBC mode for encryption, as well as maintain that
initialization vector in our meta-data so that it is
available for decryption purposes. Note that the
initialization vector is not meant to be a secret. It
only needs to be random.
When writing data to the file system, the
database system writes them in chunks to minimize
the I/O overhead. A chunk is a collection of data
pages where each page is 4KB in size. A page is set
of rows, and a database table is a collection of pages.
This poses an interesting question as to the level of
granularity to adopt for encryption. We have chosen
the data page to be that level granularity. A row
level granularity would have had a higher impact on
performance as encryption calls would have to be
made for each row separately. A chunk level
granularity would have created a dependency
between the pages in that chunk due to the chaining
inherent to the CBC mode. For example, to decrypt
page 5, one must first decrypt pages 1, 2, 3, and 4.
This would have had a negative impact on query
performance as it diminishes the value of index-
based access.
It is also worth noting that the data page level
granularity has allowed us to avoid having to
needlessly increase the database size due to
encryption. In fact, encryption block ciphers such as
AES and 3DES encrypt data one block at a time. For
example, the block size for AES is 16 Bytes. This
means that when the clear text to encrypt is not an
exact multiple of the block size, padding is required
and this obviously increases the cipher text
compared to the original clear text. Fortunately, the
choice of a data page for the encryption granularity
avoids this problem as data pages are always an
exact multiple of the encryption block size.
5.2.3 Transaction Logs
Transaction logs are files where the database system
logs transactions such as insert, delete, and update
operations. They are a critical component for
ensuring the integrity of the database as well as for
allowing recoverability of the database following a
database crash. The structure of a transaction log file
consists of two pieces: A header which contains
meta-data about the file, and a payload which
contains the actual database transaction details.
In section 5.2.2 above, we have seen how the
placement of the encryption run-time ensures that all
data written to disk, including transaction logs, is
automatically encrypted. However, transaction logs
pose one additional challenge. In a database
recovery scenario, we must be able to decrypt the
transaction logs even when the database system is
down. This means that we cannot rely on the DEK