
(a) English COVID-19 related topics (b) Arabic COVID-19 related topics
Figure 5: Extracted topics using the proposed model.
preserving semantic regularities between the learned
word vectors when extracting latent topics from a
collection of social data. The proposed model has
been evaluated against a set of baseline models using
English and Arabic Covid-19 datasets collected from
Facebook. From experimental results, it is clear that
the proposed model outperforms all baselines. Topics
discovered using the proposed model yield high
scores for the three used evaluation metrics and these
results showed that topics discovered by this model
are more coherent and human interpretable. Finally,
this model can be used for system recommendations
on social networks, text mining, and other statistical
analysis on a corpus of unstructured texts.
REFERENCES
Amara, A., Hadj Taieb, M. A., and Ben Aouicha, M. (2021).
Multilingual topic modeling for tracking covid-19
trends based on facebook data analysis. Applied
Intelligence, 51:3052–3073.
Basile, V., Cauteruccio, F., and Terracina, G. (2021).
How dramatic events can affect emotionality in social
posting: The impact of covid-19 on reddit. Future
Internet, 13(2):29.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent
dirichlet allocation. Journal of machine Learning
research, 3(Jan):993–1022.
Chen, Y., Sherren, K., Smit, M., and Lee, K. Y. (2023).
Using social media images as data in social science
research. New Media & Society, 25(4):849–871.
Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise,
C. M., Brugnoli, E., Schmidt, A. L., Zola, P., Zollo,
F., and Scala, A. (2020). The covid-19 social media
infodemic. Scientific reports, 10(1):1–10.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer,
T. K., and Harshman, R. (1990). Indexing by latent
semantic analysis. Journal of the American society
for information science, 41(6):391–407.
Egger, R. (2022). Topic modelling: Modelling hidden
semantic structures in textual data. In Applied Data
Science in Tourism: Interdisciplinary Approaches,
Methodologies, and Applications, pages 375–403.
Springer.
Garcia, K. and Berton, L. (2021). Topic detection
and sentiment analysis in twitter content related to
covid-19 from brazil and the usa. Applied soft
computing, 101:107057.
Hofmann, T. (1999). Probabilistic latent semantic indexing.
In Proceedings of the 22nd annual international ACM
SIGIR conference on Research and development in
information retrieval, pages 50–57.
Laureate, C. D. P., Buntine, W., and Linger, H. (2023).
A systematic review of the use of topic models for
short text social media analysis. Artificial Intelligence
Review, pages 1–33.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a).
Efficient estimation of word representations in vector
space. arXiv preprint arXiv:1301.3781.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and
Dean, J. (2013b). Distributed representations of words
and phrases and their compositionality. Advances in
neural information processing systems, 26.
Thompson, L. and Mimno, D. (2020). Topic modeling with
contextualized word representation clusters. arXiv
preprint arXiv:2010.12626.
Tijare, P. and Rani, P. J. (2020). Exploring popular topic
models. In Journal of Physics: Conference Series,
volume 1706, page 012171. IOP Publishing.
Wang, J. and Zhang, X.-L. (2023). Deep nmf topic
modeling. Neurocomputing, 515:157–173.
Xue, J., Chen, J., Chen, C., Zheng, C., Li, S., and
Zhu, T. (2020). Public discourse and sentiment
during the covid 19 pandemic: Using latent dirichlet
allocation for topic modeling on twitter. PloS one,
15(9):e0239441.
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
282