the complete set of features we obtained a satisfactory
result with a value of F
= 0.86. The precision value
of P = 0.88 determines the ability to embrace the pro-
posed solution in real-world scenarios. The addition
of anchortext features is able to improve both P and R
passing from an overall F
of 0.72 to 0.86. This latter
observation deserves particular attention since justi-
fies the definition of the proposed novel ML model.
Moreover experimental results show that the use
of the f
and f
visual features enable a further
improvement of the recognition performance, in par-
ticular the Recall value that goes from 0.81 to 0.84.
In this paper we have shown that ML techniques can
be used in conjunction with LA to achieve automatic
recognition of the main entity from pages with a given
topic and well-known web-usability-driven structure.
Experimental results show encouraging results on the
proposed dataset and highlight the advantage of com-
bining the two sources of information, text blocks
with their visual formatting styles and incoming an-
chor texts. Future works include the experimentation
on other website domains and the extension of the
current set general purpose features with additional
domain specific features.
