profiles, automatically generates useful link proposals
for new users, and moreover,it gives insight about the
users’ preferences to the tourism agents. This work
has been done for Bidasoa Turismo, a tourism web-
site in our environment, but it could be extended to
any other environment since it uses the minimum in-
formation stored in any web server (in common log
format). We preprocessed the data, prepared it so that
it could be used with machine learning algorithms, we
divided the database into two parts training and test,
applied PAM to the training data to discover groups
of users with similar navigation patterns and SPADE
to discover the profiles associated to each of the clus-
ters. We further enriched thoseprofilesaddingseman-
tic information using two options MG4J and KYWD.
In the exploitation phase we related each test example
to just one of the built profiles (1-NN).
We evaluated, based on a a hold-out strategy, dif-
ferent configurations of the system and how it per-
forms at different stages of the user navigation: 10%,
25% and 50%. We calculated recall and F-measure
statistics and analyzed the semantic profiles.
Results showed that the use of the semantic
knowledge extracted from the website content in-
formation improves the performance, recall and F-
measure values, of the system proposed, and, more-
over, this improvement is greater at early stages of the
navigation so, the system deals better with the zero
day or cold start problem. Furthermore, using content
information gives the option to enrich the generated
profiles with semantic information that can be very
useful for service providers.
This work opens the door to many future tasks. A
deeper analysis of the differences of the two options
implemented for URL content comparison MG4J and
KYWD should be done. Moreover, the Topic Model-
ing option could be another option to extract semantic
knowledge from the websites’ content. More sophis-
ticated strategies to build semantic could also be ex-
plored.
ACKNOWLEDGEMENTS
This work was funded by the University of the Basque
Country, general funding for research groups, AL-
DAPA (GIU10/02); by the Science and Education
Department of the Spanish Government, ModelAc-
cess (TIN2010-15549 project); by the Diputaci´on
Foral de Gipuzkoa, Zer4You (DG10/5); and by the
Basque Government’s SAIOTEK program, Datacc
(S-PE11UN097).
REFERENCES
Abou-Shouk, M., Lim, W. M., and Megicks, P. (2012). In-
ternet adoption by travel agents: a case of egypt. Inter-
national Journal of Tourism Research, pages n/a–n/a.
Aha, D. W., Breslow, L., and Mu˜noz-Avila, H. (2001).
Conversational case-based reasoning. Appl. Intell.,
14(1):9–32.
Boldi, P. and Vigna, S. (2006). Mg4j at trec 2006. In
Voorhees, E. M. and Buckland, L. P., editors, TREC,
volume Special Publication 500-272. National Insti-
tute of Standards and Technology (NIST).
Chordia, B. S. and Adhiya, K. P. (2011). Grouping web
access sequences using sequence alignment method.
Indian Journal of Computer Science and Engineering
(IJCSE), 2(3):308–314.
Cooley, R., Mobasher, B., and Srivastava, J. (1999). Data
preparation for mining world wide web browsing pat-
terns. Knowledge and Information System, 1:5–32.
Dasarathy, S. (1991). Nearest neighbor (NN) norms : NN
pattern classification techniques. IEEE Computer So-
ciety Press.
GNU (1996). Gnu wget.
Gretzel, U. (2011). Intelligent systems in tourism: A so-
cial science perspective. Annals of Tourism Research,
38(3):757–779.
Gusfield, D. (1997). Algorithms on strings, trees, and se-
quences: computer science and computational biol-
ogy. Cambridge University Press, New York, NY,
USA.
He, D. and G¨oker, A. (2000). Detecting session boundaries
from web user logs. Proceedings of the 22nd Annual
Colloquium on Information Retrieval Research.
Kaufman, L. and Rousseeuw, P. (1990). Finding Groups
in Data An Introduction to Cluster Analysis. Wiley
Interscience, New York.
Madylova, A. and gduc, S. G. (2009). A taxonomy based se-
mantic similarity of documents using the cosine mea-
sure. In ISCIS, pages 129–134. IEEE.
Mobasher, B. (2006). 12 web usage mining. Encyclopedia
of Data Warehousing and Data Mining Idea Group
Publishing, pages 449–483.
Pierrakos, D., Paliouras, G., Papatheodorou, C., and Spy-
ropoulos, C. D. (2003). Web usage mining as a tool for
personalization: A survey. User Modeling and User-
Adapted Interaction, 13(4):311–372.
Schiaffino, S. and Amandi, A. (2009). Artificial intelli-
gence. chapter Intelligent user profiling, pages 193–
216. Springer-Verlag, Berlin, Heidelberg.
Srivastava, T., Desikan, P., and Kumar, V. (2005). Web min-
ing – concepts, applications and research directions.
pages 275–307.
W3C (1995). The world wide web consortium: The com-
mon log format.
Yahoo! (June 15 2011). Term extraction documentation for
yahoo! search.
Zaki, J. M. (2001). Spade: An efficient algorithm for min-
ing frequent sequences. Mach. Learn., 42(1-2):31–60.
KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
292