Using Formal Concept Analysis for Corpus Visualisation and Relevance Analysis

Fabrice Boissier, Irina Rychkova, Benedicte Le Grand

2024

Abstract

Corpora analysis is a common task in digital humanities that profits from the advances in topic modeling and visualization from the computer science and information system fields. Topic modeling is often done using methods from the Latent Dirichlet Allocation (LDA) family, and visualizations usually propose views based on the input documents and topics found. In this paper, we first explore the use of Formal Concept Analysis (FCA) as a replacement for LDA in order to visualize the most important keywords and then the relevance of multiple documents concerning close topics. FCA offers another method for analyzing texts that is not based on probabilities but on the analysis of a lattice and its formal concepts. The main processing pipeline is as follows: first, documents are cleaned using TreeTagger and BabelFy; next, a lattice is built. Following this, the mutual impact is calculated as part of the FCA process. Finally, a force-based graph is generated. The output map is composed of a graph displaying keywords as rings of importance, and documents positioned based on their relevance. Three experiments are presented to evaluate the keywords displayed and how well relevance is evolving on the output map.

Download


Paper Citation


in Harvard Style

Boissier F., Rychkova I. and Le Grand B. (2024). Using Formal Concept Analysis for Corpus Visualisation and Relevance Analysis. In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS; ISBN 978-989-758-716-0, SciTePress, pages 120-129. DOI: 10.5220/0013047800003838


in Bibtex Style

@conference{kmis24,
author={Fabrice Boissier and Irina Rychkova and Benedicte Le Grand},
title={Using Formal Concept Analysis for Corpus Visualisation and Relevance Analysis},
booktitle={Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS},
year={2024},
pages={120-129},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013047800003838},
isbn={978-989-758-716-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS
TI - Using Formal Concept Analysis for Corpus Visualisation and Relevance Analysis
SN - 978-989-758-716-0
AU - Boissier F.
AU - Rychkova I.
AU - Le Grand B.
PY - 2024
SP - 120
EP - 129
DO - 10.5220/0013047800003838
PB - SciTePress