User Profiling: On the Road from URLs to Semantic Features
Claudio Barros, Perrine Moreau
2022
Abstract
Text data is undoubtedly one of the most rich and peculiar source of information there is. It can come in many forms and require specific treatment based on their nature in order to create meaningful features that can be subsequently used in predictive modelling. URLs in particular are quite specific and require adaptations in terms of processing compared to usual corpora of texts. In this paper, we review different ways we have used URLs to create meaningful features, both by exploiting the URL itself and by scrapping its page content. We additionally attempt to measure the impact of the addition of different groups of features created in a predictive modelling use case.
DownloadPaper Citation
in Harvard Style
Barros C. and Moreau P. (2022). User Profiling: On the Road from URLs to Semantic Features. In Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-583-8, pages 227-235. DOI: 10.5220/0011139900003269
in Bibtex Style
@conference{data22,
author={Claudio Barros and Perrine Moreau},
title={User Profiling: On the Road from URLs to Semantic Features},
booktitle={Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2022},
pages={227-235},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011139900003269},
isbn={978-989-758-583-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - User Profiling: On the Road from URLs to Semantic Features
SN - 978-989-758-583-8
AU - Barros C.
AU - Moreau P.
PY - 2022
SP - 227
EP - 235
DO - 10.5220/0011139900003269