Large Language Models for Summarizing Czech Historical Documents and Beyond

Václav Tran; Jakub Šmíd; Jakub Šmíd; Jiří Martínek; Jiří Martínek; Ladislav Lenc; Ladislav Lenc; Pavel Král; Pavel Král

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Large Language Models for Summarizing Czech Historical Documents and Beyond

Topics: Deep Learning; Machine Learning; Natural Language Processing; Neural Networks

In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, 798-804, 2025 , Porto, Portugal

Authors: Václav Tran ¹ ; Jakub Šmíd ^{1

;

2} ; Jiří Martínek ^{1

;

2} ; Ladislav Lenc ^{1

;

2} and Pavel Král ^{1

;

2}

Affiliations: ¹ Department of Computer Science and Engineering, University of West Bohemia in Pilsen, Univerzitní, Pilsen, Czech Republic ; ² NTIS - New Technologies for the Information Society, University of West Bohemia in Pilsen, Univerzitní, Pilsen, Czech Republic

Keyword(s): Czech Text Summarization, Deep Neural Networks, Mistral, mT5, Posel od ˇCerchova, SumeCzech, Transformer Models.

Abstract: Text summarization is the task of shortening a larger body of text into a concise version while retaining its essential meaning and key information. While summarization has been significantly explored in English and other high-resource languages, Czech text summarization, particularly for historical documents, remains underexplored due to linguistic complexities and a scarcity of annotated datasets. Large language models such as Mistral and mT5 have demonstrated excellent results on many natural language processing tasks and languages. Therefore, we employ these models for Czech summarization, resulting in two key contributions: (1) achieving new state-of-the-art results on the modern Czech summarization dataset SumeCzech using these advanced models, and (2) introducing a novel dataset called Posel od ˇCerchova for summarization of historical Czech documents with baseline results. Together, these contributions provide a great potential for advancing Czech text summarization and open new avenues for research in Czech historical text processing. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.144.224.216

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Tran, V., Šmíd, J., Martínek, J., Lenc, L. and Král, P. (2025). Large Language Models for Summarizing Czech Historical Documents and Beyond. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-737-5; ISSN 2184-433X, SciTePress, pages 798-804. DOI: 10.5220/0013374100003890

@conference{icaart25,
author={Václav Tran and Jakub Šmíd and Ji\v{r}í Martínek and Ladislav Lenc and Pavel Král},
title={Large Language Models for Summarizing Czech Historical Documents and Beyond},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2025},
pages={798-804},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013374100003890},
isbn={978-989-758-737-5},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - Large Language Models for Summarizing Czech Historical Documents and Beyond
SN - 978-989-758-737-5
IS - 2184-433X
AU - Tran, V.
AU - Šmíd, J.
AU - Martínek, J.
AU - Lenc, L.
AU - Král, P.
PY - 2025
SP - 798
EP - 804
DO - 10.5220/0013374100003890
PB - SciTePress