Authors:
Yahya Emara
1
;
Tristan Weger
1
;
Ryan Rubadue
1
;
Rishabh Choudhary
1
;
Simona Doboli
2
and
Ali Minai
1
Affiliations:
1
University of Cincinnati, Cincinnati, OH 45221, U.S.A.
;
2
Hosftra University, Hempstead, NY, 11549, U.S.A.
Keyword(s):
Semantic Spaces, Cognitive Maps, Semantic Clustering, Language Models, Interpretable Vector-Space Embeddings.
Abstract:
With the emergence of deep learning-based semantic embedding models, it has become possible to extract large-scale semantic spaces from text corpora. Semantic elements such as words, sentences and documents can be represented as embedding vectors in these spaces, allowing their use in many applications. However, these semantic spaces are very high-dimensional and the embedding vectors are hard to interpret for humans. In this paper, we demonstrate a method for obtaining more meaningful, lower-dimensional semantic spaces, or cognitive maps, through the semantic clustering of the high-dimensional embedding vectors obtained from a real-world corpus. A key limitation in this is the presence of semantic noise in real-world document corpora. We show that pre-filtering the documents for semantic relevance can alleviate this problem, and lead to highly interpretable cognitive maps.