of indices reside in each index block, data storage
pattern on disk and data access pattern. The
simulator provides output in the form of buffer
cache misses and OS cache misses. The simulator
output is validated with actual cache misses
perceived by a query during its execution. The
validation is performed by comparing cache misses
collected from query execution plan with cache
misses obtained from the simulator. For validation
purpose, custom queries on TPC-H benchmark
schema are used. The simulator can predict cache
misses with an average 2% of prediction error.
This paper is organized as follows. Section 2
reviews the prior work on cache simulation. The
design of database buffer simulator is presented in
Section 3. It describes how database cache works,
how simulation works and types of data access
patterns which had been taken into account while
building the simulator. Section 4 follows with
elaboration of the validation process in simulation
along with its results. The conclusion is given in
Section 5. References are mentioned in last section.
2 RELATED WORK
Simulation is a well-established technique for
studying the computer hardware and predicting the
system performance. Over the years, many
simulation systems with the goal of providing a
general tool for such studies have been developed.
Several works have been carried out related with
operating system’s cache simulation (Tao and
Weidendorfer, 2004; Tao and Karl, 2006; Holliday,
1992; Sugumar and Abraham, June 1993). Jie Tao
and Wolfgang Karl have simulated cache in detail to
detect bottleneck, reasons of misses and
optimization potentialities (Jie and Wolfgang, 2006).
Rabin A. Sugumar and Santosh G. Abraham have
modified the OPT algorithm with variety and came
up with efficient algorithm using which miss
characterization can be performed via reasonable
simulation resources (Sugumar and Abraham, 1993).
Several methods for cache simulation have been
developed; for example, use of address reference
traces (Holliday, 1992), use of runtime
instrumentation of applications (Tao and
Weidendorfer, 2004).
Along with OS cache, lot of work in the past few
years has been carried out in web cache simulation
as well (Cárdenas et al., 2005; Cárdenas et al.,
2004). L.G. Cárdenas and team have developed new
techniques for proxy cache simulation (Cárdenas et
al., 2004). In addition, L.G. Cárdenas has also
proposed a proxy-cache platform to check the
performance of a web object based on the multi-key
management techniques and algorithms. The
proposed platform developed in a modular way,
which allows the implementation of new algorithms
or policy proposals in an easy and robust manner
(Cárdenas et al., 2005).
There has been work done in literature on
simulating functional behaviour of database buffer
cache however they do not simulate cache misses for
larger data sizes. Daniel Moniz and Paul Fortier
have done the simulation analysis of a real time
database buffer manager (Moniz and Fortier 1996).
The authors have analysed the buffer management
policies and presented two new algorithms for page
replacement. However, they have not focussed on
database cache hits and misses of data as well as
index blocks for larger data size. Rekha Singhal and
Manoj Nambiar has talked about estimation of IO
access time on larger data size for various disk
access patterns during SQL query execution, but
does not include the delay in IO access time due to
effect of cache behaviour on larger data size
(Singhal and Nambiar, 2013). The simulator
proposed in this paper is about deriving database
cache hits and misses depending on the data access
pattern, which is not been analysed earlier.
3 DATABASE CACHE
SIMULATION
When a query is executed on the database, to locate
and retrieve any row in any table, several access
paths can be used. For example, Full Table Scan,
Row-id Scans Operation, Index Scans. When
database server performs a full table scan, it reads
blocks sequentially, while for index scan it first
needs to get an address of data block from index
block, hence reads blocks randomly. Thus for each
row, two physical blocks are demanded. Both these
blocks are first looked in the database buffer cache
and otherwise demanded from the OS. The OS then
itself looks for the blocks in its own cache and if still
not found, fetches them from the disk or physical
storage by calling an I/O operation. An important
thing to note is that index blocks can store a much
larger number of indices than the number of rows
stored in a data block. This means that the
probability of repetitive access of an index block is
always significantly higher than the probability of
repetitive access of a data block.
The relative order of data access block and data
storage location impacts the cache behavior.