Using Formal Concept Analysis for Corpus Visualisation and Relevance Analysis
Fabrice Boissier, Irina Rychkova, Benedicte Le Grand
2024
Abstract
Corpora analysis is a common task in digital humanities that profits from the advances in topic modeling and visualization from the computer science and information system fields. Topic modeling is often done using methods from the Latent Dirichlet Allocation (LDA) family, and visualizations usually propose views based on the input documents and topics found. In this paper, we first explore the use of Formal Concept Analysis (FCA) as a replacement for LDA in order to visualize the most important keywords and then the relevance of multiple documents concerning close topics. FCA offers another method for analyzing texts that is not based on probabilities but on the analysis of a lattice and its formal concepts. The main processing pipeline is as follows: first, documents are cleaned using TreeTagger and BabelFy; next, a lattice is built. Following this, the mutual impact is calculated as part of the FCA process. Finally, a force-based graph is generated. The output map is composed of a graph displaying keywords as rings of importance, and documents positioned based on their relevance. Three experiments are presented to evaluate the keywords displayed and how well relevance is evolving on the output map.
DownloadPaper Citation
in Harvard Style
Boissier F., Rychkova I. and Le Grand B. (2024). Using Formal Concept Analysis for Corpus Visualisation and Relevance Analysis. In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS; ISBN 978-989-758-716-0, SciTePress, pages 120-129. DOI: 10.5220/0013047800003838
in Bibtex Style
@conference{kmis24,
author={Fabrice Boissier and Irina Rychkova and Benedicte Le Grand},
title={Using Formal Concept Analysis for Corpus Visualisation and Relevance Analysis},
booktitle={Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS},
year={2024},
pages={120-129},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013047800003838},
isbn={978-989-758-716-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS
TI - Using Formal Concept Analysis for Corpus Visualisation and Relevance Analysis
SN - 978-989-758-716-0
AU - Boissier F.
AU - Rychkova I.
AU - Le Grand B.
PY - 2024
SP - 120
EP - 129
DO - 10.5220/0013047800003838
PB - SciTePress