Figure 2: Recommendation accuracy using varying topN (left) and kNN (middle) with frequent patterns. Recommendation
accuracy using infrequent noisy patterns with varying kNN on the right.
ware. This study presented experimental comparison
between the proposed information retrieval model and
a leading commercial DBMS system with two rela-
tional database schemas. Based on these experiments,
the information retrieval model outperforms the rela-
tional schemas in both performance and flexibility.
The experimental results show that the pat-
tern analysis performance using relational database
schemas and a regular desktop computer begins to
have problems with several million coded data sets.
The results also show that the information retrieval
model is able to produce rapid results to flexible
queries with 100 million coded data sets with regu-
lar desktop hardware.
Overall the results show that the information re-
trieval recommender provides more accurate recom-
mendation with a small amount of kNN. The model
is able to produce recommendations also with noisy
and partially invalid patterns which is often the case
in the real world as the quality of the data is far from
perfect.
The proposed model utilizes kernel methods and
vector distance functions for efficient nearest neigh-
bor queries. Different types of distance functions en-
able flexibility for different use cases. In the future we
are applying the model to healthcare data sets. This
includes empowering the similarity functions with do-
main specific knowledge from the healthcare coding
schemes.
REFERENCES
Agrawal, R. and Srikant, R. (1994). Fast algorithms for
mining association rules in large databases. In VLDB
’94: Proceedings of the 20th International Conference
on Very Large Data Bases, pages 487–499, San Fran-
cisco, CA, USA. Morgan Kaufmann Publishers Inc.
Brijs, T., Goethals, B., Swinnen, G., Vanhoof, K., and Wets,
G. (2000). A data mining framework for optimal prod-
uct selection in retail supermarket data: the gener-
alized profset model. In KDD ’00: Proceedings of
the sixth ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 300–
304, New York, NY, USA. ACM.
Gang, Q., Sural, S., Gu, Y., and Pramanik, S. (2004). Sim-
ilarity between euclidean and cosine angle distance
for nearest neighbor queries. In Proceedings of the
2004 ACM symposium on Applied computing, pages
1232–1237. Michigan State University, ACM. ISBN:
1-58113-812-1.
Grabs, T., B
¨
ohm, K., and Schek, H.-J. (2001). Powerdb-
ir: information retrieval on top of a database cluster.
In CIKM ’01: Proceedings of the tenth international
conference on Information and knowledge manage-
ment, pages 411–418, New York, NY, USA. ACM.
Harizopoulos, S., Liang, V., Abadi, D. J., and Madden,
S. (2006). Performance tradeoffs in read-optimized
databases. In VLDB ’06: Proceedings of the 32nd
international conference on Very large data bases,
pages 487–498. VLDB Endowment.
Haykin, S. (1999). Neural Networks - A Comprehensive
Foundation. Prentice Hall. ISBN: 0-13-273350-1.
Herlocker, J. L., Konstan, J. A., Terveen, L. G., and Riedl,
J. T. (2004). Evaluating collaborative filtering recom-
mender systems. ACM Trans. Inf. Syst., 22(1):5–53.
Nanopoulos, A. and Manolopoulos, Y. (2002). Efficient
similarity search for market basket data. The VLDB
Journal, 11(2):138–152.
Roelleke, T., Wu, H., Wang, J., and Azzam, H. (2008).
Modelling retrieval models in a probabilistic relational
algebra with a new operator: the relational Bayes.
VLDB Journal: Very Large Data Bases, 17(1):5–37.
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cher-
niack, M., Ferreira, M., Lau, E., Lin, A., Madden,
S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., and
Zdonik, S. (2005). C-store: a column-oriented dbms.
In VLDB ’05: Proceedings of the 31st international
conference on Very large data bases, pages 553–564.
VLDB Endowment.
WHO (2004). International Statistical Classification of
Diseases and Related Health Problems, Instruction
manual, volume 2. World Health Organization. ISBN:
92 4 154649 8.
Wilkinson, R. and Hingston, P. (1991). Using the cosine
measure in a neural network for document retrieval.
ACM, pages 202–210.
Zobel, J. and Moffat, A. (2006). Inverted files for text search
engines. ACM Comput. Surv., 38(2):6.
APPLYING INFORMATION RETRIEVAL FOR MARKET BASKET RECOMMENDER SYSTEMS
143