6 CONCLUSION
In this paper, we proposed a visualization pipeline for
textual corpora analysis based on FCA instead of the
usual LDA for the topic modeling step. Mutual im-
pact was used within FCA in order to produce a ma-
trix for the force-based graph algorithm. The pipeline
produces a map that can be used in two ways:
• The main keywords are placed by order of impor-
tance, allowing the reader to quickly get an idea
of the topics contained in the corpus
• Documents are placed based on their relevance to
the keywords found, allowing the reader to see an
eventual discrepancy in the chosen texts.
The map presents a visualization of the corpus as
a whole. Removing a document impacts the visual-
ization because of the absence of a node and because
the topic modeling step does not work on the same
texts. To evaluate these claims, we presented a case
study on multiple PHP courses and an intruder Java
course (also about programming). First, a map dis-
played the most important keywords and the varia-
tions with and without the Java course. Then, one of
the PHP courses containing more than half of its text
about out-of-scope topics was corrected, showing a
significant upgrade in the output.
We consider FCA an exciting method for topic
modeling and expect to try other metrics on the lat-
tice in order to find more possible usages. Multiple
usages and combinations have been already proposed
in (Poelmans et al., 2013), but we expect to use the
conceptual similarity metric (Jaffal et al., 2015) for
an even more precise combination of terms. Also, a
deeper comparison with LDA and other newer meth-
ods like neural networks might be interesting as the
construction of the results does not rely on probabili-
ties and is perfectly transparent thanks to the set the-
ory behind FCA. Numerous applications in digital hu-
manities can be considered with FCA because trans-
parency, and therefore explainability, are already part
of its design by highlighting relationships between
objects.
REFERENCES
Ahn, J.-w. and Brusilovsky, P. (2009). Adaptive visualiza-
tion of search results: Bringing user models to visual
analytics. Information Visualization, 8(3):167–179.
Akhtar, N., Javed, H., and Ahmad, T. (2019). Hierarchical
summarization of text documents using topic model-
ing and formal concept analysis. In Data Manage-
ment, Analytics and Innovation: Proceedings of ICD-
MAI 2018, Volume 2, pages 21–33. Springer.
Alghamdi, R. and Alfalqi, K. (2015). A survey of topic
modeling in text mining. Int. J. Adv. Comput. Sci.
Appl.(IJACSA), 6(1).
Andrienko, G., Andrienko, N., Drucker, S. M., Fekete, J.-
D., Fisher, D., Idreos, S., Kraska, T., Li, G., Ma, K.-
L., Mackinlay, J. D., et al. (2020). Big data visual-
ization and analytics: Future research challenges and
emerging applications. BigVis 2020: Big data visual
exploration and analytics.
Assa, J., Cohen-Or, D., and Milo, T. (1997). Displaying
data in multidimensional relevance space with 2d vi-
sualization maps. In Proceedings. Visualization’97
(Cat. No. 97CB36155), pages 127–134. IEEE.
Bastian, M., Heymann, S., and Jacomy, M. (2009). Gephi:
an open source software for exploring and manipulat-
ing networks. In Third international AAAI conference
on weblogs and social media.
Belohlavek, R. (2008). Introduction to formal concept anal-
ysis. Palacky University, Department of Computer
Science, Olomouc, 47.
Blei, D. and Lafferty, J. (2006). Correlated topic models.
Advances in neural information processing systems,
18:147.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent
dirichlet allocation. Journal of machine Learning re-
search, 3(Jan):993–1022.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D.,
Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., et al. (2020). Language models are few-
shot learners. Advances in neural information pro-
cessing systems, 33:1877–1901.
Carpineto, C. and Romano, G. (2004). Concept data anal-
ysis: Theory and applications. John Wiley & Sons.
De Sisto, M., Hern
´
andez-Lorenzo, L., De la Rosa, J., Ros,
S., and Gonz
´
alez-Blanco, E. (2024). Understand-
ing poetry using natural language processing tools: a
survey. Digital Scholarship in the Humanities, page
fqae001.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer,
T. K., and Harshman, R. (1990). Indexing by latent
semantic analysis. Journal of the American society
for information science, 41(6):391–407.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2018). Bert: Pre-training of deep bidirectional trans-
formers for language understanding. arXiv preprint
arXiv:1810.04805.
di Sciascio, C., Mayr, L., and Veas, E. (2017). Exploring
and summarizing document colletions with multiple
coordinated views. In Proceedings of the 2017 ACM
Workshop on Exploratory Search and Interactive Data
Analytics, pages 41–48.
Eisenstein, J., Chau, D. H., Kittur, A., and Xing, E. (2012).
Topicviz: Interactive topic exploration in document
collections. In CHI’12 Extended Abstracts on Human
Factors in Computing Systems, pages 2177–2182.
Elliott, S. (2021). Proof of concept research. Philosophy of
Science, 88(2):258–280.
Engebretsen, M. and Kennedy, H. (2020). Data visualiza-
tion in society. Amsterdam university press.
KMIS 2024 - 16th International Conference on Knowledge Management and Information Systems
128