Also, the text-files previously stored in the SE are
downloaded to be used as input files by each job
execution (9).
At the end of the execution, it is possible to
retrieve multiple output files (one .SQL file per
user). Therefore, another script must be launched in
order to assemble all these files into a single one
containing recommendations for all users. The
ultimate step of this workflow is to update the on-
line database using the generated .SQL file.
To keep track of all executions of our
Recommender System, the post-processing script is
also in charge of storing some relevant statistics into
the AMGA Metadata Catalogue, such as the date
and time when the .SQL file was generated, the total
of ratings, the amount of users entitled to get
recommendations, the number of generated
recommendations and so forth.
5 CONCLUSIONS AND FUTURE
WORKS
Grids first emerge within scientific communities,
like High Energy Physics (HEP) experiments.
However, the enormous research activity in recent
years has contributed to the development of new
areas of interest. Commercial users have been
attracted by this technology, which can potentially
be exploited by industries and SMEs to offer new
services with reduced costs and higher performance.
This expansion from science to business is
nearing Grids to “utility computing”, where
computing power is viewed as a utility, available on
a pay-as-you-use basis, like gas or electricity. This is
not yet the case, but there are several ongoing
initiatives to develop new tools and Grid services
that should allow the outsourcing of computing
resources in the short term.
Although the case study presented in this paper
did not focus on a comparative analyses between
our approach and other implementations of CF
algorithm, it is interesting to note that the resources
required to run our Recommender System are owned
by other entities, letting our own resources
(computers and storage) free to perform other tasks.
This is an immediate benefit of using Grids,
specially for SMEs that cannot afford to have their
own computer farm to run their
simulations/algorithms.
The distributed approach used in this case study
has helped to reduce the time consumption of the
original algorithm, since each job launched to the
Grid is in charge of calculating the recommendations
for one single user - O(mn). The overall execution
time to generate recommendations to all users will
depend on the number of Computing Elements and
Worker Nodes available in the Grid. The best
scenario would be to have all jobs running
simultaneously in different Worker Nodes. The
bigger the number of free CPUS in a Grid, the better
is the chance of this scenario occurs.
The strategy of fetching data from the database
and splitting it into several text-files is due to a lack
of a Grid enabled DataBase Management System
(DBMS) supported by the middleware and available
to the users. However, since many applications need
to access, manage and process huge amount of data,
there are several initiatives trying to face this
important challenge. One of them is the Grid
Relational Catalog Project (GRelC,2007), which
provides a data access interface to access standards
DMBS (MySQL, PostgreSQl, Oracle etc.) in a Grid
environment.
As a future work, we intend to create a new
version of our Recommender System exploring the
GRelC service. We also intend to deploy our system
in the EELA-2 production Grid infrastructure, since
GILDA is devoted to learning purposes only. The
results and discussions of both experiments will be
presented in future papers.
REFERENCES
Amga (2006). The gLite Grid Metadata Catalogue.
Retrieved February 8, 2009, from:
http://amga.web.cern.ch/amga/
Barbera, R., Ardizzone, V., Ciuffo, L.N. et al. (2008) Grid
INFN virtual Laboratory for Dissemination Activities -
GILDA (2008). 6th International Conference on Open
Access, Lilongwe, Malawi. GILDA portal available at:
https://gilda.ct.infn.it/
Bégin, M.E. (2008) An EGEE comparative study: Grids
and Clouds - Evolution or Revolution? Retrieved
February 8, 2009, from:
https://edms.cern.ch/file/925013/4/EGEE-Grid-Cloud-
v1_2.pdf
Beingrid. (2006). Business Experiments in Grid. Retrieved
February 8, 2009, from: http://www.beingrid.eu/
Biz2Grid. (2008). Moving Business to the Grid. Retrieved
February 12, 2009, from: http://www.biz2grid.de
Ciuffo, L.N. (2001). Cinefilia website. Retrieved February
12, 2009, from: http://canalcinefilia.com.br
ClassAd. (2004). Condor Classified Advertisements.
Retrieved February 12, 2009, from:
http://www.cs.wisc.edu/condor/classad
USING GRIDS TO SUPPORT INFORMATION FILTERING SYSTEMS - A Case Study of Running Collaborative
Filtering Recommendations on gLite
17