REFERENCES
Ahmed, Z. and Singh, H. (2019). Text extraction and clus-
tering for multimedia: A review on techniques and
challenges. In 2019 International Conference on Dig-
itization (ICD), pages 38–43. IEEE.
Benfenati, D., Montanaro, M., Rinaldi, A. M., Russo, C.,
and Tommasino, C. (2023). Using focused crawlers
with obfuscation techniques in the audio retrieval do-
main. In International Conference on Management of
Digital, pages 3–17. Springer.
Bergman, M. K. (2001). White paper: the deep web: sur-
facing hidden value. Journal of electronic publishing,
7(1).
Bhatt, D., Vyas, D. A., and Pandya, S. (2015). Focused web
crawler. algorithms, 5:18.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent
dirichlet allocation. Journal of machine Learning re-
search, 3(Jan):993–1022.
Bosch, A., Zisserman, A., and Munoz, X. (2007). Repre-
senting shape with a spatial pyramid kernel. In Pro-
ceedings of the 6th ACM international conference on
Image and video retrieval, pages 401–408.
Chakrabarti, S., Van den Berg, M., and Dom, B. (1999).
Focused crawling: a new approach to topic-specific
web resource discovery. Computer networks, 31(11-
16):1623–1640.
Cheng, H., Liu, S., Sun, W., and Sun, Q. (2023). A neural
topic modeling study integrating sbert and data aug-
mentation. Applied Sciences, 13(7).
Clinchant, S., Ah-Pine, J., and Csurka, G. (2011). Semantic
combination of textual and visual information in mul-
timedia retrieval. In Proceedings of the 1st ACM in-
ternational conference on multimedia retrieval, pages
1–8.
Danilak, M. (2017). Langdetect 1.0. 7. Python Package
Index.
Farag, M. M., Lee, S., and Fox, E. A. (2018). Focused
crawler for events. International Journal on Digital
Libraries, 19:3–19.
Fatima, N., Faheem, M., and Dar, M. Z. N. (2023). Op-
timized focused crawling for web page classification.
In 2023 International Conference on Energy, Power,
Environment, Control, and Computing (ICEPECC),
pages 1–6.
Fern
`
andez-Ca
˜
nellas, D., Marco Rimmek, J., Espadaler, J.,
Garolera, B., Barja, A., Codina, M., Sastre, M., Giro-i
Nieto, X., Riveiro, J. C., and Bou-Balust, E. (2020).
Enhancing online knowledge graph population with
semantic knowledge. In The Semantic Web–ISWC
2020: 19th International Semantic Web Conference,
Athens, Greece, November 2–6, 2020, Proceedings,
Part I 19, pages 183–200. Springer.
Fu, T., Abbasi, A., and Chen, H. (2010). A focused crawler
for dark web forums. Journal of the American Society
for Information Science and Technology, 61(6):1213–
1231.
Hajba, G. L. (2018). Website scraping with python. Berke-
ley: Apress.
Hassan, T., Cruz, C., and Bertaux, A. (2017). Ontology-
based approach for unsupervised and adaptive focused
crawling. In Proceedings of The International Work-
shop on Semantic Big Data, pages 1–6.
Hinz, T., Heinrich, S., and Wermter, S. (2020). Semantic
object accuracy for generative text-to-image synthe-
sis. IEEE transactions on pattern analysis and ma-
chine intelligence, 44(3):1552–1565.
K, N. T., S, C., G, B., Dharani, C., and Karishma, M. S.
(2023). Comparative analysis of various web crawler
algorithms.
Kittler, J., Hatef, M., Duin, R. P., and Matas, J. (1998). On
combining classifiers. IEEE transactions on pattern
analysis and machine intelligence, 20(3):226–239.
Kumar, N. and Aggarwal, D. (2023). Learning-based
focused web crawler. IETE Journal of Research,
69(4):2037–2045.
Kunder, M. d. (2018). The size of the world wide web (the
internet). Pobrano z: http://www. world widewebsize.
com/(19.01. 2017).
Landauer, T. K., Foltz, P. W., and Laham, D. (1998). An in-
troduction to latent semantic analysis. Discourse pro-
cesses, 25(2-3):259–284.
Liu, J., Li, X., Zhang, Q., and Zhong, G. (2022). A
novel focused crawler combining web space evolu-
tion and domain ontology. Knowledge-based systems,
243:108495.
Lowe, G. (2004). Sift-the scale invariant feature transform.
Int. J, 2(91-110):2.
Mary, J. D. P. N. R., Balasubramanian, S., and Raj, R.
S. P. (2022). An enhanced focused web crawler for
biomedical topics using attention enhanced siamese
long short term memory networks. Brazilian Archives
of Biology and Technology, 64:e21210163.
Mohandes, M., Deriche, M., and Aliyu, S. O. (2018). Clas-
sifiers combination techniques: A comprehensive re-
view. IEEE Access, 6:19626–19639.
Pant, G. and Srinivasan, P. (2005). Learning to crawl: Com-
paring classification schemes. ACM Transactions on
Information Systems (TOIS), 23(4):430–462.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G.,
Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark,
J., et al. (2021). Learning transferable visual models
from natural language supervision. In International
conference on machine learning, pages 8748–8763.
PMLR.
Ridnik, T., Ben-Baruch, E., Noy, A., and Zelnik-Manor,
L. (2021). Imagenet-21k pretraining for the masses.
arXiv preprint arXiv:2104.10972.
Rinaldi, A. M. (2014). Using multimedia ontologies for au-
tomatic image annotation and classification. In 2014
IEEE International Congress on Big Data, pages 242–
249. IEEE.
Rinaldi, A. M. and Russo, C. (2021). Using a multimedia
semantic graph for web document visualization and
summarization. Multimedia Tools and Applications,
80(3):3885–3925.
Rinaldi, A. M., Russo, C., and Tommasino, C. (2021a).
A semantic approach for document classification us-
KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval
100