in_cache_flag
With this database structure, the system can keep
track of users’ requests, and select the requests with
the highest activity rates to cache.
5.2 Data Structures in Main Memory
There are two data structures staying in the system
memory to support the cache catalogue search:
query-log and query-cache. Like the request-history,
the query-log has a very simple structure with only
two fields:
request
timestamp
The query-log therefore can be implemented as a
hash-table or an array-list since there is not much
operations performed on it. If the query-log is
implemented in an array-list, the following
command can be used to write a request to the log:
log.add ({request, timestamp})
To get an object from the log, the command is like:
log.get(i)
where i is the index for the array-list. The following
command is to empty the log:
log = new ArrayList ( )
The query-cache holds cached catalogue objects
together with their correspondent requests. The
query-cache is better to be implemented in a hash-
table. With this data structure one can easily perform
operations of search, add or remove to the query-
cache. Search and retrieve data from the hash-table
(named QC) use the command like:
data = QC.get(hash-key)
If the query-cache is in a hash-table format, it
will have the following two fields:
request, as the hash key and
a set of catalogue objects corresponding to the
request
With this data structure support, the command to
retrieve required catalogue objects is like:
QC.get(request)
In this system context, both the query-log and the
query-cache need to be the global variables. It makes
the processes behave like the search engine and the
cache manager and the system start-up processes be
very simple since the global data structures are
independent of any processes.
5.3 The Searching Process
When the search engine receives a search request, it
uses the request as a hash-key and searches the
cache (the hash-table). If the searching is successful,
it gets the set of catalogue objects and returns the
results to the user. Otherwise, the search engine
continues to search the database. At the end of the
process, the search engine always writes the request
accompanied with the timestamp to the query-log.
The query log process can be scheduled to run in
background. On each run, the cache manger moves
all requests from the query-log to the request-history
table. After adding new requests with the
timestamps to the request-history table, the cache
manager uses them to update the request-summary
table. For a valid request (invalid entries will be
ignored), the cache manager will further check if the
request already exists in the request-summary table.
If existing, the properties for this request
(been_called_total and t_called_time) will be
modified. If the request is not found in the table, a
new entry for this request is created. The cache
manager will then check if the new request was
already loaded into the cache, and do this if not.
6 ADAPTIVE EVICTION
PROCESS
The system defines two variables, the maximum-
cache-size and minimum-cache-size. The maximum-
cache-size is the maximum number of requests that
can be loaded into the cache while the minimum-
cache-size is the minimum number of requests kept
in the cache at any time.
6.1 Eviction Process
There are a few of cache replacement algorithms
often used in information retrieval systems like
LRU, LFU and some hybrid replacement algorithms.
Adaptive cache replacement algorithm is an
algorithm with better performance. The algorithm,
used in the designed system, is implemented by
keeping track of both frequently used and recently
used cached items in conjunction with the recent
eviction history. (Santhanakrishnan, Amer,
Chrysanthis and Li 2004).
As aforementioned, the cache manager is also
responsible for replacement. Each time the process
runs, the cache manager checks if the maximum-
cache-size is exceeded. If it happened, the cache
AN OPEN SOURCE SOFTWARE BASED LIBRARY CATALOGUE SYSTEM USING SOUNDEXING RETRIEVAL
AND QUERY CACHING
187