with similar subjects.
If we compare the results obtained by the (normal)
profiles with those of the ”ideal profiles”, we can ob-
serve that ideal profiles are better, as might have been
expected. However, the differences are small. For
example, for the best configuration, the differences
between ideal and normal profiles are in percentage
7.1%, 5.6% and 3.7% for MAP, RPrec and R@10, re-
spectively. Therefore, normal profiles perform close
to the expected maximum performance offered by the
ideal profiles. Thus, although the categorization per-
formance of multilabel classifiers shown in Table 1
was not very high, using the subjects predicted by
these classifiers during the MP profile matching have
had an acceptable performance and it was consistent
with the behavior of the ideal profiles based on ac-
tual thesaurus subjects assigned manually. This be-
havior along with the former effect of using the sim-
ple matching leads us to believe that the assignment
of MP based on subjects profiles could be heavily
dominated by the most frequent subjects, which are
those which usually have higher weights in the SF
and SFIDF schemes and get higher rates of success
in the predictions made by our multilabel classifiers,
because of the support of a greater range of positive
examples characterizing them.
6 CONCLUSIONS
A content based filtering method to deal with the
problem of assigning parliamentary documents to
members of the parliament potentially interested on
them has been described and evaluated. User and doc-
ument profiles are defined using subjects taken from
a conceptual thesaurus, and document profile genera-
tion is modeled as a multilabel categorization prob-
lem. The proposed method has been validated us-
ing real world data from a collection of parliamentary
documents, manually annotated by human experts.
Several matching approaches were evaluated and we
were able to get an approximate document concep-
tual representation and a profile matching method
achieving performance measures not very far from
the ”ideal” case. More work needs to be done in im-
proving the applied multilabel categorization methods
and also to evaluate alternative matching functions.
Although a priory the similarity-based expansion of
subject profiles seemed to be a promising alternative
to get more flexible matching, the simple strategy we
have proposed was unable to improve profile match-
ing quality.
ACKNOWLEDGEMENTS
Paper supported by the Spanish “Ministerio de
Econom
´
ıa y Competitividad” under projects
TIN2013-42741-P and FFI2014-51978-C2-1.
REFERENCES
Belkin, N.J., and Croft, W.B. (1992). Information Filter-
ing and Information Retrieval: Two Sides of the Same
Coin? Communications of the ACM, 35:29–38.
de Campos, L.M., Fern
´
andez-Luna, J.M., Huete, J.F.,
Martin-Dancausa, C.J., Tur-Vigil, C., Tagua, A.
(2009). An Integrated System for Managing the An-
dalusian Parliament’s Digital Library. Program: Elec-
tronic Library and Information Systems, 43:121–139.
Chang, C.-C and Lin, C.-J (2011). LIBSVM: A Library
for Support Vector Machines. ACM Transactions on
Intelligent Systems and Technology, 2:27:1–27:27.
Gauch, S., Speretta, M., Chandramouli, A., and Micarelli,
A. (2007). User Profiles for Personalized Information
Access. In: The Adaptative Web. LCNS, vol. 4321,
pages 54–89.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann,
P., and Witten, I.H. (2009). The WEKA Data Min-
ing Software: An Update. SIGKDD Explorations,
11(1):10–18.
Hanani, U., Shapira, B., and Shoval, P. (2001). Informa-
tion Filtering: Overview of Issues, Research and Sys-
tems. User Modeling and User-Adapted Interaction,
11:203–259.
Lantz, B. (2013). Machine Learning with R. Packt Publish-
ing Ltd.
Lin., D. (1998). An Information-Theoretic Definition
of Similarity. Proceedings of the Fifteenth Inter-
national Conference on Machine Learning (ICML
1998), pages 296–304.
Lops, P., de Gemmis, M., and Semerano, G. (2011).
Content-based Recommender Systems: State of the
Art and Trends. In: Recommender Systems Hand-
book, pages 73–105, Springer.
Pazzani, M., and Billsus, D. (2007). Content-based Recom-
mendation Systems. In: The Adaptive Web. LCNS,
vol. 4321, pages 325–341.
Read, J., Pfahringer, B., Holmes, G., and Frank, E. (2011).
Classifier Chains for Multi-label Classification. Ma-
chine Learning, 85(3):333–359.
Silla Jr., C.N., and Freitas, A.A. (2011) A Survey of Hierar-
chical Classification across different Application Do-
mains. Data Mining and Knowledge Discovery, 22(1-
2):31–72.
Tsoumakas, G., Katakis, I., Vlahavas, I. (2010). Mining
Multi-label Data. In Data Mining and Knowledge
Discovery Handbook, pages 667–685, O. Maimon, L.
Rokach (Eds.), Springer.
Yeh, A. (2000). More accurate tests for the statistical sig-
nificance of result differences. In Proceedings of the
18th International Conference on Computational Lin-
guistics (COLING), pages 947–953.
KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval
416