Dinh, D., & Tamine, L. (2012). Towards a context sensitive
approach to searching information based on domain
specific knowledge sources. Journal of Web Semantics,
12–13, 41–52. https://doi.org/10.1016/J.WEBSEM.20
11.11.009
Fu, M., Qu, H., Huang, L., & Lu, L. (2018). Bag of meta-
words: A novel method to represent document for the
sentiment classification. Expert Systems with
Applications, 113, 33–43. https://doi.org/10.1016/
J.ESWA.2018.06.052
Fu, Y., Feng, Y., & Cunningham, J. P. (2020). Paraphrase
Generation with Latent Bag of Words. ArHiv.
Hansen, P. C. (1987). The truncatedSVD as a method for
regularization. BIT Numerical Mathematics 1987 27:4,
27(4), 534–553. https://doi.org/10.1007/BF01937276
Harris, Z. S. (1954). Distributional Structure.
Distributional Structure, WORD, 10(3), 146–162.
https://doi.org/10.1080/00437956.1954.11659520
Jacobi, C., Van Atteveldt, W., & Welbers, K. (2016).
Quantitative analysis of large amounts of journalistic
texts using topic modelling. Digital Journalism, 4(1),
89–106.
https://doi.org/10.1080/21670811.2015.1093271
Joachims, T. (1997). A probabilistic analysis of the Rocchio
algorithm with TFIDF for text. Int l Conf on Machine
Learning (ICML). Ten-7301, 9.
Kadriu, A., Abazi, L., & Abazi, H. (2019). Albanian Text
Classification: Bag of Words Model and Word
Analogies. Business Systems Research Journal, 10(1),
74–87. https://doi.org/10.2478/bsrj-2019-0006
Karpov, I., & Goroslavskiy, A. (2012). Application of
BIRCH to text clustering. https://www.researchgate.
net/publication/286672732_Application_of_BIRCH_t
o_text_clustering
Klinger, J. (2019). Big, fast human-in-the-loop NLP with
Elasticsearch | by Joel Klinger | Towards Data Science.
https://towardsdatascience.com/big-fast-nlp-with-
elasticsearch-72ffd7ef8f2e
Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S.,
Barnes, L. E., & Brown, D. E. (2019). Text
Classification Algorithms: A Survey. Information
(Switzerland), 10(4). https://doi.org/10.3390/info1004
0150
Le, Q. V., & Mikolov, T. (2014). Distributed
Representations of Sentences and Documents.
Liu, Q., Kusner, M. J., & Blunsom, P. (2020). A Survey on
Contextual Embeddings.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A.
Y., & Potts, C. (2011). Learning Word Vectors for
Sentiment Analysis.
Malzer, C., & Baum, M. (2019). A Hybrid Approach To
Hierarchical Density-based Cluster Selection. IEEE
International Conference on Multisensor Fusion and
Integration for Intelligent Systems, 2020-Septe, 223–
228. https://doi.org/10.1109/MFI49285.2020.9235263
McInnes, L., Healy, J., & Melville, J. (2018). UMAP:
Uniform Manifold Approximation and Projection for
Dimension Reduction. https://doi.org/10.48550/arxiv.1
802.03426
Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013).
Efficient Estimation of Word Representations in Vector
Space. ICLR, 1–12.
Pang, B., & Lee, L. (2008). Opinion Mining and Sentiment
Analysis: Foundations and Trends in Information
Retrieval. 2(1–2), 1–135. https://doi.org/10.1561/150
0000011
Parsons, L., Haque, E., & Liu, H. (2004). Subspace
Clustering for High Dimensional Data: A Review *.
Sigkdd Exporations, 6(1), 90–105.
Pennington, J., Socher, R., & Manning, C. (2014). Glove:
Global Vectors for Word Representation. Proceedings
of the 2014 Conference on Empirical Methods in
Natural Language Processing (EMNLP), 1532–1543.
https://doi.org/10.3115/v1/D14-1162
Shawon, A., Zuhori, S. T., Mahmud, F., & Rahman, J.
(2018). Website Classification Using Word Based
Multiple N-Gram Models And Random Search
Oriented Feature Parameters. 2018 21st International
Conference of Computer and Information Technology
(ICCIT), 21-23 December, 1–6. https://doi.org/
10.1109/ICCITECHN.2018.8631907
Steinbach, M., Kumar, V., & Ertöz, L. (2003). The
Challenges of Clustering High Dimensional Data Big
Data in Climate View project Understanding Climate
Change: A Data Driven Approach View project The
Challenges of Clustering High Dimensional Data *.
https://doi.org/10.1007/978-3-662-08968-2_16
Taloba, A. I., Eisa, D. A., & Ismail, S. S. I. (2018). A
Comparative Study on using Principle Component
Analysis with Different Text Classifiers.
Tong, Z., & Zhang, H. (2016). A Text Mining Research
Based on LDA Topic Modelling. 201–210.
https://doi.org/10.5121/csit.2016.60616
Ular, M., & Robnik-Šikonja, M. (2020). High quality
ELMo embeddings for seven less-resourced languages.
LREC 2020 - 12th International Conference on
Language Resources and Evaluation, Conference
Proceedings, 4731–4738. http://hdl.handle.net/11356/
1064
Vola, S. (2017). How to use ElasticSearch for Natural
Language Processing and Text Mining — Part 2 -
Dataconomy. https://dataconomy.com/2017/05/use-
elasticsearch-nlp-text-mining-part-2/
Warrens, M. J., & van der Hoef, H. (2020). Understanding
the rand index. Studies in Classification, Data Analysis,
and Knowledge Organization, 301–313.
https://doi.org/10.1007/978-981-15-3311-2_24
Yellai, M. (2016). GitHub - pandastrike/bayzee: Text
classification using Naive Bayes and Elasticsearch.
https://github.com/pandastrike/bayzee
Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH:
An Efficient Data Clustering Method for Very Large
Databases. SIGMOD Record (ACM Special Interest
Group on Management of Data), 25(2), 103–114.
https://doi.org/10.1145/235968.233324