User Profiling: On the Road from URLs to Semantic Features

Claudio Barros, Perrine Moreau

2022

Abstract

Text data is undoubtedly one of the most rich and peculiar source of information there is. It can come in many forms and require specific treatment based on their nature in order to create meaningful features that can be subsequently used in predictive modelling. URLs in particular are quite specific and require adaptations in terms of processing compared to usual corpora of texts. In this paper, we review different ways we have used URLs to create meaningful features, both by exploiting the URL itself and by scrapping its page content. We additionally attempt to measure the impact of the addition of different groups of features created in a predictive modelling use case.

Download


Paper Citation


in Harvard Style

Barros C. and Moreau P. (2022). User Profiling: On the Road from URLs to Semantic Features. In Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-583-8, pages 227-235. DOI: 10.5220/0011139900003269


in Bibtex Style

@conference{data22,
author={Claudio Barros and Perrine Moreau},
title={User Profiling: On the Road from URLs to Semantic Features},
booktitle={Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2022},
pages={227-235},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011139900003269},
isbn={978-989-758-583-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 11th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - User Profiling: On the Road from URLs to Semantic Features
SN - 978-989-758-583-8
AU - Barros C.
AU - Moreau P.
PY - 2022
SP - 227
EP - 235
DO - 10.5220/0011139900003269