sive function. Hence, the predicted and proactively
cached entires are more likely to be used. To setup
the experiment, we simulated a workload as follows.
We divided the data tuples in the
Protein
table into
5 segments, each one consists of 40K tuples. In the
first experiment (labeled “Uniform Workload” in Fig-
ure 11), all segments have the same probability of be-
ing involved in (touched by) a user’s query. There-
fore, it is harder for the system to learn a useful invo-
cation pattern. In contrast, in the second experiment
(labeled “Skewed Workload” in Figure 11), the each
segment has a different probability of being touched
by a user’s query. We set the probabilities to be expo-
nentially decreasing, e.g, Segment s
i
has double prob-
ability than Segment s
i−1
of being queried. A given
user query will touch 50,000 tuples across the 5 seg-
ments according to the probabilities set in each exper-
iment.
As the experiments show, the benefit from the
proactive execution and quality of the predicted invo-
cation pattern depends on the user’s workload. In the
case of Uniform Workload, all tuples have the same
chances of being queried, higher number of the pred-
icated and proactively cached values are wrong pre-
dictions. Therefore, the savings at query time is not
large even if we increase the number of queries before
a given invalidation. In contrast, the results in the case
of Skewed Workload show larger savings because the
system is able to identify the invocation patterns more
likely to be used in the future. And hence, the cache
hits get higher. This is more effective as the number
of queries before the invalidation is relatively large,
e.g., 8 or 16.
7 CONCLUSION
We proposed the FunctionGuard system for effi-
ciently incorporating expensive functions in relational
database queries. FunctionGuard is distinct from ex-
isting systems in that it leverages disk-based caches
in novel ways to speedup query execution by avoid-
ing unnecessary invocations. It addition, it can be in-
tegrated with any of the state-of-art techniques that
build optimal query plans in the presence of expen-
sive functions. The unique features of FunctionGuard
include: (1) Automated mechanisms for analyzing ex-
pensive functions and building the corresponding de-
pendency graph between functions and data sources,
(2) Cache-aware query processing and optimizations
based on the three-bundle operators to integrate the
cached data into the query pipeline, And (3) mech-
anisms for updating and refreshing the disk-based
caches in batch-optimized and proactive ways. The
empirical evaluation demonstrated the effectiveness
of the proposed system to speedup queries and en-
hance the utilization of the existing cache.
REFERENCES
Chang, K. C.-C. and Hwang, S.-w. (2002). Minimal
probing: Supporting expensive predicates for top-k
queries. In Proceedings of the 2002 ACM SIGMOD
International Conference on Management of Data,
pages 346–357.
Chaudhuri, S. and Gravano, L. (1996). Optimizing queries
over multimedia repositories. pages 91–102.
Chaudhuri, S., Narasayya, V., and Sarawagi, S. (2002). Ef-
ficient evaluation of queries with mining predicates.
In ICD, pages 529–540.
Chaudhuri, S. and Shim, K. (1993). Query Optimization in
the Presence of Foreign Functions. In Proceedings of
the 19th International Conference on Very Large Data
Bases, VLDB ’93, pages 529–542.
Chaudhuri, S. and Shim, K. (1996). Optimization of queries
with user-defined predicates. In ACM Transactions on
Database Systems, pages 87–98.
Denny, M. and Franklin, M. (2006). Operators for expen-
sive functions in continuous queries. In Data Engi-
neering, 2006. ICDE ’06. Proceedings of the 22nd In-
ternational Conference on, pages 147–147.
Gray, J., Liu, D. T., Nieto-Santisteban, M., Szalay, A., De-
Witt, D. J., and Heber, G. (2005). Scientific Data
Management in the Coming Decade. SIGMOD Rec.,
34(4):34–41.
Haas, L., Schwarz, P., Kodali, P., Kotlar, E., Rice, J., and
Swope, W. (2001). Discoverylink: A system for inte-
grated access to life sciences data sources. IBM Sys-
tems Journal, 40(2):489–511.
Hanson, E. N., Carnes, C., Huang, L., Konyala, M.,
Noronha, L., Parthasarathy, S., Park, J., and Vernon,
A. (1999). Scalable trigger processing. In In Proceed-
ings of the 15th International Conference on Data En-
gineering (ICDE), pages 266–275.
Hellerstein, J. M. (1994). Practical predicate placement.
In In Proceedings of the ACM SIGMOD International
Conference on Management of Data, pages 325–335.
Hellerstein, J. M. (1998). Optimization techniques for
queries with expensive methods. ACM Transactions
on Database Systems (TODS.
Hellerstein, J. M. and Naughton, J. F. (1996). Query Execu-
tion Techniques for Caching Expensive Methods. In
Proceedings of the 1996 ACM SIGMOD International
Conference on Management of Data, SIGMOD ’96,
pages 423–434.
Hellerstein, J. M. and Stonebraker, M. (1993). Predicate
migration: Optimizing queries with expensive pred-
icates. In Proceedings of the 1993 ACM SIGMOD
International Conference on Management of Data,
pages 267–276.
Munagala, K., Srivastava, U., and Widom, J. (2007). Opti-
mization of continuous queries with shared expensive
filters. In In PODS 07: Proc. of the twenty-sixth ACM
FunctionGuard-AQueryEngineforExpensiveScientificFunctionsinRelationalDatabases
105