Large Language Models for Summarizing Czech Historical Documents
and Beyond
V
´
aclav Tran
1 a
, Jakub
ˇ
Sm
´
ıd
1,2 b
, Ji
ˇ
r
´
ı Mart
´
ınek
1,2 c
, Ladislav Lenc
1,2 d
and Pavel Kr
´
al
1,2 e
1
Department of Computer Science and Engineering, University of West Bohemia in Pilsen,
Univerzitn
´
ı, Pilsen, Czech Republic
2
NTIS - New Technologies for the Information Society, University of West Bohemia in Pilsen,
Univerzitn
´
ı, Pilsen, Czech Republic
nuva@students.zcu.cz, {jaksmid, llenc, jimar, pkral}@kiv.zcu.cz
Keywords:
Czech Text Summarization, Deep Neural Networks, Mistral, mT5, Posel od
ˇ
Cerchova, SumeCzech,
Transformer Models.
Abstract:
Text summarization is the task of shortening a larger body of text into a concise version while retaining
its essential meaning and key information. While summarization has been significantly explored in English
and other high-resource languages, Czech text summarization, particularly for historical documents, remains
underexplored due to linguistic complexities and a scarcity of annotated datasets. Large language models
such as Mistral and mT5 have demonstrated excellent results on many natural language processing tasks and
languages. Therefore, we employ these models for Czech summarization, resulting in two key contributions:
(1) achieving new state-of-the-art results on the modern Czech summarization dataset SumeCzech using these
advanced models, and (2) introducing a novel dataset called Posel od
ˇ
Cerchova for summarization of historical
Czech documents with baseline results. Together, these contributions provide a great potential for advancing
Czech text summarization and open new avenues for research in Czech historical text processing.
1 INTRODUCTION
The rapid evolution of Natural Language Processing
(NLP) techniques has elevated the performance of
text summarization systems. While most advances
focus on high-resource languages like English, the
Czech language, particularly historical variations, re-
mains underrepresented. Historical Czech documents
pose unique challenges due to linguistic shifts, out-
dated vocabulary, and inconsistent syntax. These nu-
ances create a significant gap in the development of
automated summarization systems capable of han-
dling this domain effectively.
Therefore, this paper addresses two interlinked
challenges. First, it seeks to establish new state-of-
the-art benchmarks on SumeCzech, the most com-
prehensive dataset for modern Czech text summariza-
tion using modern Large Language Models (LLMs),
a
https://orcid.org/0009-0003-0250-2821
b
https://orcid.org/0000-0002-4492-5481
c
https://orcid.org/0000-0003-2981-1723
d
https://orcid.org/0000-0002-1066-7269
e
https://orcid.org/0000-0002-3096-675X
namely Mistral (Jiang et al., 2023) and mT5 (Xue
et al., 2021b). Second, recognizing the lack of re-
sources tailored for historical Czech, we introduce a
newly created dataset derived from the historical jour-
nal Posel od
ˇ
Cerchova. The dataset is specifically de-
signed to facilitate summarization tasks in historical
contexts, enabling future researchers to address the
linguistic complexities inherent in this domain. This
corpus is freely available for research purposes
1
.
By combining model advancements and dataset
innovation, this research aims to drive progress in the
Czech summarization field and open venues for ap-
plications in cultural preservation, historical research,
and digital humanities.
2 RELATED WORK
Text summarization methods can be categorized into
abstractive and extractive ones. Extractive sum-
marization selects the most representative sentences
from the source document, while abstractive summa-
1
https://corpora.kiv.zcu.cz/posel od cerchova/
798
Tran, V., Šmíd, J., Martínek, J., Lenc, L. and Král, P.
Large Language Models for Summarizing Czech Historical Documents and Beyond.
DOI: 10.5220/0013374100003890
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 17th International Conference on Agents and Artificial Intelligence (ICAART 2025) - Volume 2, pages 798-804
ISBN: 978-989-758-737-5; ISSN: 2184-433X
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
rization generates summaries composed of newly cre-
ated sentences.
Early summarization methods were extractive
ones and relied on statistical and graph-based meth-
ods like TF-IDF (Term Frequency-Inverse Document
Frequency) (Christian et al., 2016), which scores sen-
tence importance based on term frequency relative to
rarity across a corpus. Similarly, TextRank (Mihal-
cea and Tarau, 2004) represents sentences as nodes
in a graph and ranks them using the PageRank algo-
rithm (Page et al., 1999).
Neural networks advanced both extractive
and also abstractive summarization by model-
ing sequences with Recurrent Neural Networks
(RNNs) (Elman, 1990). One extractive approach
involves sequence-to-sequence architectures where
LSTM models capture the contextual importance of
each sentence within a document (Nallapati et al.,
2017). Hierarchical attention networks combine
sentence-level and word-level attention to better
capture document structure and relevance for sum-
marization (Yang et al., 2016). This approach
has proven effective in summarizing longer and
more complex documents. Hybrid approaches
combining BERT embeddings (Devlin et al., 2019)
with K-Means clustering (Lloyd, 1982) to identify
key sentences (Miller, 2019) have shown excellent
performance for abstractive summarization.
Advances in sequence-to-sequence Transformer-
based models (Vaswani et al., 2017) have revolution-
ized abstractive summarization. Recent models like
T5 (Raffel et al., 2020a) adopt a text-to-text frame-
work and excel in various tasks, including summa-
rization, due to pre-training on the C4 dataset. PE-
GASUS (Zhang et al., 2019) introduces gap sen-
tences generation for masking key sentences dur-
ing pre-training, achieving strong performance on 12
datasets. Similarly, BART (Lewis et al., 2019) uses
denoising objectives for robust text summary gener-
ation. Multilingual models such as mT5 (Xue et al.,
2021b) and mBART (Liu et al., 2020) extend these
capabilities to multiple languages, including Czech,
through datasets like mC4 (Xue et al., 2021a) and
multilingual Common Crawl
2
.
However, these models often underperform on
non-English corpora without fine-tuning.
3 DATASETS
The following section provides a brief review of the
primary existing summarization datasets. Moreover,
2
http://commoncrawl.org/
the created Posel od
ˇ
Cerchova corpus will also be de-
tailed at the end of this section.
3.1 English Datasets
CNN/Daily Mail (Hermann et al., 2015): dataset
consists of over 300,000 English news articles, each
paired with highlights written by the article authors. It
has been widely used in summarization and question-
answering tasks, evolving through several versions
tailored for specific NLP tasks.
XSum (Narayan et al., 2018): contains 226,000
single-sentence summaries paired with BBC articles
covering diverse domains such as news, sports, and
science. Its focus on single-sentence summarization
makes it less biased toward extractive methods.
Arxiv Dataset (Cohan et al., 2018): includes 215,000
pairs of scientific papers and their abstracts sourced
from arXiv. It has been cleaned and formatted to en-
sure standardization, with sections like figures and ta-
bles removed.
BOOKSUM (Kryscinski et al., 2022): is a dataset
tailored for summarizing long texts like novels, plays,
and stories, with summaries provided at paragraph,
chapter, and book levels. Texts and summaries
were sourced from Project Gutenberg and other web
archives, supporting both extractive and abstractive
summarization.
3.2 Multilingual Datasets
XLSum (Hasan et al., 2021): provides over one mil-
lion article-summary pairs across 44 languages, rang-
ing from low-resource languages like Bengali and
Swahili to high-resource languages such as English
and Russian. Extracted from various BBC sites, this
dataset is a valuable resource for multilingual summa-
rization research.
MLSUM (Scialom et al., 2020): consists of 1.5 mil-
lion article-summary pairs in ve languages: German,
Russian, French, Spanish, and Turkish. The dataset
was created by archiving news articles from well-
known newspapers, including Le Monde and El Pais,
with a focus on ensuring broad topic coverage.
The above-mentioned datasets are for English
summarization, and some are multilingual; however,
Czech resources remain very limited.
3.3 SumeCzech
SumeCzech large-scale dataset (Straka et al., 2018) is
a notable exception to the scarcity of Czech-specific
resources. This dataset was created at the Institute
Large Language Models for Summarizing Czech Historical Documents and Beyond
799
of Formal and Applied Linguistics at Charles Uni-
versity and is tailored for summarization tasks in the
Czech language. It comprises one million Czech news
articles. These articles are sourced from five ma-
jor Czech news sites:
ˇ
Cesk
´
e Noviny, Den
´
ık, iDNES,
Lidovky, and Novinky.cz. Each document is struc-
tured in JSONLines format, with fields for the URL,
headline, abstract, text, subdomain, section, and pub-
lication date. The preprocessing includes language
recognition, duplicate removal, and filtering out en-
tries with empty or excessively short headlines, ab-
stracts, or texts.
This dataset supports multiple summarization
tasks, such as headline generation and multi-sentence
abstract generation. The training, development, and
testing splits are in roughly 86.5/4.5/4.5 ratio. The
average word count is 409 for full texts and 38 for
abstracts.
Nevertheless, this dataset caters exclusively to
modern Czech and fails to address the needs of his-
torical text processing.
3.4 Posel od
ˇ
Cerchova
To construct the dataset, we used data from the histor-
ical journal Posel od
ˇ
Cerchova (POC), which is avail-
able on the archival portal Porta fontium
3
.
The construction of the dataset involved address-
ing the challenge of creating summaries for the pro-
vided texts, which were composed in historical Czech
and, in some rare cases, even German. The texts also
covered a variety of different topics, from local news
surrounding Doma
ˇ
zlice (a historic town in the Czech
Republic), opinion pieces, and various local adver-
tisements to internal and worldwide politics and feuil-
letons. Furthermore, it was important to construct a
dataset of sufficient size to ensure the accuracy and re-
liability of the evaluation. These aspects added com-
plexity to the summarization task.
To overcome the mentioned issues, we employed
state-of-the-art (SOTA) LLMs GPT-4 (OpenAI, 2024)
and Claude 3 Opus (Anthropic, 2024) (Opus) (specifi-
cally the claude-3-opus-20240229 version) for ini-
tial text summary creation. These models were se-
lected based on their SOTA performance in many
NLP tasks and excellent performance in some prelim-
inary summarization experiments.
While generating the summaries, it was essential
to ensure conciseness. Since most of the implemented
methods were fine-tuned on the SumeCzech dataset,
we aimed to maintain consistency by creating sum-
maries in a journalistic style, reflecting the dataset’s
3
https://www.portafontium.eu
characteristics. To achieve this, the prompts for gen-
erating the summaries included explicit instructions,
as shown below:
Vytvo
ˇ
r shrnut
´
ı n
´
asleduj
´
ıc
´
ıho textu ve stylu
novin
´
a
ˇ
re. Po
ˇ
cet v
ˇ
et <= 5; (EN: Create a sum-
mary of the following text in the style of a jour-
nalist. Number of sentences <= 5)
During the summarization task, we observed that
while both models produced summaries of very good
quality, Opus tended to create more succinct and
stylistically appropriate ones, closely aligning with
the news reporter format. However, there were in-
stances where summaries generated by Opus exhib-
ited an excessive focus on a single topic.
On the other hand, GPT-4 aimed to incorporate
a greater level of detail within the five-sentence con-
straint but occasionally deviated from the specified
stylistic prompt.
If the model-generated summary exhibited signifi-
cant stylistic deviations or excessive focus on a single
topic, we either modified or regenerated it until a cor-
rect version was achieved.
Two-level summaries were created; the first one
was on the page level, and the second one summarizes
a whole article that is usually composed of several
pages. We thus summarized 432 pages, effectively
resulting in the creation of 100 issue summaries. The
subset containing page summaries is hereafter re-
ferred to as POC-P, while the issue summaries are
referred to as POC-I. Note that all created summaries
were checked and corrected manually by two native
Czech speakers.
The dataset is in the .json format and contains the
following information:
text. Text extracted from the given page, a digital
rendition of the original printed content;
summary. Summary of the page, which is no
more than 5 sentences long;
year. Publication year of the journal;
journal. Specification of the source journal: the
day, month, and the number of the issue is con-
tained within this identifier;
page src. Name of the source image file con-
verted into the text;
page num. Page number.
This dataset is designed to support summarization
tasks within Czech historical contexts, providing re-
searchers with the tools to tackle the linguistic chal-
lenges unique to this domain. The corpus is freely
accessible for research purposes
4
.
4
https://corpora.kiv.zcu.cz/posel od cerchova/
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
800
4 MODELS
The experiments employ two advanced Transformer-
based models, Multilingual Text-to-Text Transfer
Transformer (mT5) (Xue et al., 2021b) and Mistral
7B (Jiang et al., 2023).
4.1 Multilingual Text-to-Text Transfer
Transformer
The Multilingual Text-to-Text Transfer Transformer
(mT5) is a variant of the T5 model designed for mul-
tilingual tasks. This model is trained on the multi-
lingual mC4 dataset (Xue et al., 2021a), which in-
cludes Czech, and effectively handles a wide range
of languages. The model is based on Transformer
encoder-decoder architecture and uses a Sentence-
Piece tokenizer (Kudo and Richardson, 2018) to pro-
cess complex language structures, including Czech
morphology. Pre-trained using a span corruption ob-
jective (Raffel et al., 2020b), mT5 predicts masked
spans of text, enabling it to learn semantic and con-
textual relationships.
The mT5 model is available in various sizes, from
small with 300 million parameters to XXL with 13
billion parameters, and is therefore adapted to dif-
ferent computational needs. The base variant of the
mT5, which contains 580 million parameters, is used
for further experiments.
4.2 Mistral Language Model
The Mistral Language Model (Mistral LM) is a highly
efficient large language model known for its robust
performance across diverse natural language process-
ing tasks. It is designed to combine high accuracy
with computational efficiency, achieving state-of-the-
art results in reasoning, text generation, summariza-
tion, and other NLP applications. Mistral 7B, with its
7 billion parameters, strikes a balance between com-
putational efficiency and task performance, surpass-
ing larger models like 13B or 34B in several bench-
marks.
This model utilizes advanced attention mecha-
nisms like Grouped-Query Attention (GQA) (Ainslie
et al., 2023) and Sliding Window Attention
(SWA) (Beltagy et al., 2020). GQA enhances pro-
cessing speed by grouping attention heads to focus
on the same input data, while SWA reduces computa-
tional costs by limiting token attention to nearby to-
kens. The model supports techniques such as quanti-
zation (Gholami et al., 2021) and Low-Rank Adapta-
tion (LoRA) (Hu et al., 2021) for efficient fine-tuning
on limited hardware, enabling it to handle longer in-
puts effectively.
5 EXPERIMENTS
5.1 Evaluation Metrics
The following evaluation metrics are used.
ROUGE (Recall-Oriented Understudy for Gisting
Evaluation) (Lin, 2004) is a set of metrics used to
evaluate the quality of summaries by comparing n-
gram overlaps between a system-generated summary
and reference texts. Key ROUGE metrics include
ROUGE-N (for n-gram overlap) and ROUGE-L (for
the longest common subsequence).
ROUGERAW (Straka and Strakov
´
a, 2018) is a
variant of ROUGE that evaluates raw token-level
overlaps between predicted and reference texts with-
out any preprocessing like stemming or lemmatiza-
tion. It measures exact matches of tokens, making
it suitable for tasks where precise token alignment is
important.
5.2 Set-up
We used AdamW optimizer (Loshchilov and Hut-
ter, 2017) with a learning rate set to 0.001 as sug-
gested by authors of mT5 (Xue et al., 2021b) for the
training of this model. For Mistral 7B, we utilized
QLoRA (Dettmers et al., 2024), a method that inte-
grates a 4-bit quantized model with a small, newly
introduced set of learnable parameters. During fine-
tuning, only these additional parameters are updated
while the original model remains frozen, thereby sub-
stantially reducing memory requirements. We em-
ploy the models from the HuggingFace Transformers
library (Wolf et al., 2020). For training both mod-
els, we used a single NVIDIA A40 GPU with 45 GB
VRAM.
5.3 Model Variants
We use three variants of the models in our experi-
ments:
M7B-SC: The Mistral 7B model fine-tuned on the
SumeCzech dataset;
M7B-POC: The Mistral 7B model further fine-
tuned on the POC dataset;
mT5-SC: The mT5 model fine-tuned on the
SumeCzech dataset.
Large Language Models for Summarizing Czech Historical Documents and Beyond
801
Table 1: Results of various methods on SumeCzech dataset with precision (P), recall (R), and F1-score (F).
Method ROUGE
raw
-1 ROUGE
raw
-2 ROUGE
raw
-L
P R F P R F P R F
M7B-SC 24.4 19.7 21.2 6.5 5.3 5.7 17.8 14.5 15.5
mT5-SC 22.0 17.9 19.2 5.3 4.3 4.6 16.1 13.2 14.1
HT2A-S (Krotil, 2022) 22.9 16.0 18.2 5.7 4.0 4.6 16.9 11.9 13.5
First (Straka et al., 2018) 13.1 17.9 14.4 0.1 9.8 0.2 1.1 8.8 0.9
Random (Straka et al., 2018) 11.7 15.5 12.7 0.1 2.0 0.1 0.7 10.3 0.8
Textrank (Straka et al., 2018) 11.1 20.8 13.8 0.1 6.0 0.3 0.7 13.4 0.8
Tensor2Tensor (Straka et al., 2018) 13.2 10.5 11.3 0.1 2.0 0.1 0.2 8.1 0.8
Table 2: Results of implemented methods on the POC-P subset from Posel od
ˇ
Cerchova dataset with precision (P), recall (R),
and F1-score (F).
Method ROUGE
raw
-1 ROUGE
raw
-2 ROUGE
raw
-L
P R F P R F P R F
M7B-POC 23.5 17.4 19.6 4.8 3.5 4.0 16.6 12.2 13.8
mT5-SC 20.2 8.2 11.1 1.4 0.5 0.7 14.9 6.1 8.2
Table 3: Results of implemented methods on POC-I subset from Posel od
ˇ
Cerchova dataset with precision (P), recall (R), and
F1-score (F).
Method ROUGE
raw
-1 ROUGE
raw
-2 ROUGE
raw
-L
P R F P R F P R F
M7B-POC 19.3 17.6 18.0 3.2 2.8 2.9 13.7 12.4 12.8
mT5-SC 18.2 5.9 8.6 1.0 0.3 0.4 14.0 4.5 6.5
5.4 Results on the SumeCzech Dataset
This experiment compares the results of the proposed
mT5-SC and M7B-SC models with related work on
the SumeCzech dataset, see Table 1.
The first comparative method, HT2A-S (Krotil,
2022), is based on the mBART model, which is fur-
ther fine-tuned on the SumeCzech dataset. The other
methods provided by the authors of the SumeCzech
dataset (Straka et al., 2018) are as follows: First, Ran-
dom, Textrank and Tensor2Tensor (Vaswani et al.,
2018).
Table 1 demonstrates that the proposed M7B-SC
method is very efficient, outperforming all other base-
lines and achieving new state-of-the-art results on this
dataset. Furthermore, the second proposed approach,
mT5-SC, also performs remarkably well, consistently
obtaining the second-best results.
5.4.1 Results on Posel od
ˇ
Cerchova Dataset
This section evaluates the proposed methods on the
Posel od
ˇ
Cerchova dataset. Table 2 shows the results
on the POC-P subset containing summaries for every
page (106 pages), while Table 3 depicts the results
on the POC-I subset, which is composed of the sum-
maries of every article (25 issues).
These tables show clearly that, as in the previous
case, M7T-POC model gives significantly better re-
sults than the mT5-SC model, and it is with a very
high margin.
6 CONCLUSIONS
This paper explored the application of state-of-the-
art large language models, specifically Mistral 7B
and mT5, for summarization of Czech texts, ad-
dressing both modern and historical contexts. Our
experiments demonstrated that the proposed M7B-
SC model establishes a new benchmark for the
SumeCzech dataset, achieving state-of-the-art per-
formance, while the mT5-SC model also performed
strongly, consistently ranking second.
Furthermore, we introduced a novel dataset, Posel
od
ˇ
Cerchova, dedicated for the summarization of his-
torical Czech documents. By leveraging this dataset,
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
802
we provided baseline results and highlighted the
unique challenges posed by historical Czech texts.
These contributions not only advance the field of
Czech text summarization but also pave the way for
future research in processing historical documents,
offering significant opportunities in cultural preserva-
tion and digital humanities. Future work could focus
on further enhancing summarization quality, explor-
ing hybrid modeling approaches, and extending the
dataset for multilingual and cross-temporal studies.
ACKNOWLEDGEMENTS
This work was created with the partial support of
the project R&D of Technologies for Advanced Dig-
italization in the Pilsen Metropolitan Area (Dig-
iTech) No. CZ.02.01.01/00/23 021/0008436 and by
the Grant No. SGS-2022-016 Advanced methods
of data processing and analysis. Computational re-
sources were provided by the e-INFRA CZ project
(ID:90254), supported by the Ministry of Education,
Youth and Sports of the Czech Republic.
REFERENCES
Ainslie, J., Lee-Thorp, J., de Jong, M., Zemlyanskiy, Y.,
Lebr
´
on, F., and Sanghai, S. (2023). Gqa: Train-
ing generalized multi-query transformer models from
multi-head checkpoints.
Anthropic (2024). The Claude 3 Model Family: Opus, Son-
net, Haiku.
Beltagy, I., Peters, M. E., and Cohan, A. (2020). Long-
former: The long-document transformer.
Christian, H., Agus, M., and Suhartono, D. (2016). Sin-
gle document automatic text summarization using
term frequency-inverse document frequency (tf-idf).
ComTech: Computer, Mathematics and Engineering
Applications, 7:285.
Cohan, A., Dernoncourt, F., Kim, D. S., Bui, T., Kim, S.,
Chang, W., and Goharian, N. (2018). A discourse-
aware attention model for abstractive summarization
of long documents. In Walker, M., Ji, H., and Stent,
A., editors, Proceedings of the 2018 Conference of
the North American Chapter of the Association for
Computational Linguistics: Human Language Tech-
nologies, Volume 2 (Short Papers), pages 615–621,
New Orleans, Louisiana. Association for Computa-
tional Linguistics.
Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer,
L. (2024). Qlora: efficient finetuning of quantized
llms. In Proceedings of the 37th International Con-
ference on Neural Information Processing Systems,
NIPS ’23, Red Hook, NY, USA. Curran Associates
Inc.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2019). Bert: Pre-training of deep bidirectional trans-
formers for language understanding.
Elman, J. L. (1990). Finding structure in time. Cognitive
Science, 14(2):179–211.
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M. W.,
and Keutzer, K. (2021). A survey of quantization
methods for efficient neural network inference.
Hasan, T. et al. (2021). Xlsum: A multilingual dataset
for summarization. In Findings of the Association
for Computational Linguistics: EMNLP 2021, pages
2133–2149.
Hermann, K. M., Ko
ˇ
cisk
´
y, T., Grefenstette, E., Espeholt,
L., Kay, W., Suleyman, M., and Blunsom, P. (2015).
Teaching machines to read and comprehend. In
Advances in Neural Information Processing Systems
(NeurIPS), pages 1693–1701.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang,
S., Wang, L., and Chen, W. (2021). Lora: Low-rank
adaptation of large language models.
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford,
C., Chaplot, D. S., de las Casas, D., Bressand, F.,
Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R.,
Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T.,
Wang, T., Lacroix, T., and Sayed, W. E. (2023). Mis-
tral 7b.
Krotil, M. (2022). Text summarization methods in czech.
Bachelor’s thesis, Czech Technical University in
Prague, Faculty of Electrical Engineering, Depart-
ment of Cybernetics.
Kryscinski, W., Rajani, N., Agarwal, D., Xiong, C., and
Radev, D. (2022). BOOKSUM: A collection of
datasets for long-form narrative summarization. In
Goldberg, Y., Kozareva, Z., and Zhang, Y., edi-
tors, Findings of the Association for Computational
Linguistics: EMNLP 2022, pages 6536–6558, Abu
Dhabi, United Arab Emirates. Association for Com-
putational Linguistics.
Kudo, T. and Richardson, J. (2018). SentencePiece: A sim-
ple and language independent subword tokenizer and
detokenizer for neural text processing. In Blanco, E.
and Lu, W., editors, Proceedings of the 2018 Confer-
ence on Empirical Methods in Natural Language Pro-
cessing: System Demonstrations, pages 66–71, Brus-
sels, Belgium. Association for Computational Lin-
guistics.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mo-
hamed, A., Levy, O., Stoyanov, V., and Zettlemoyer,
L. (2019). Bart: Denoising sequence-to-sequence pre-
training for natural language generation, translation,
and comprehension.
Lin, C.-Y. (2004). ROUGE: A package for automatic evalu-
ation of summaries. In Text Summarization Branches
Out, pages 74–81, Barcelona, Spain. Association for
Computational Linguistics.
Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvinine-
jad, M., Lewis, M., and Zettlemoyer, L. (2020). Mul-
tilingual denoising pre-training for neural machine
translation. Transactions of the Association for Com-
putational Linguistics, 8:726–742.
Large Language Models for Summarizing Czech Historical Documents and Beyond
803
Lloyd, S. (1982). Least squares quantization in pcm. IEEE
Transactions on Information Theory, 28(2):129–137.
Loshchilov, I. and Hutter, F. (2017). Decoupled weight de-
cay regularization. arXiv preprint arXiv:1711.05101.
Mihalcea, R. and Tarau, P. (2004). TextRank: Bringing or-
der into text. In Lin, D. and Wu, D., editors, Pro-
ceedings of the 2004 Conference on Empirical Meth-
ods in Natural Language Processing, pages 404–411,
Barcelona, Spain. Association for Computational Lin-
guistics.
Miller, D. (2019). Leveraging bert for extractive text sum-
marization on lectures.
Nallapati, R., Zhai, F., and Zhou, B. (2017). Summarunner:
A recurrent neural network based sequence model for
extractive summarization of documents. In Proceed-
ings of the Thirty-First AAAI Conference on Artificial
Intelligence (AAAI), pages 3075–3081.
Narayan, S., Cohen, S. B., and Lapata, M. (2018). Ex-
treme summarization (xsum). In Proceedings of the
2018 Conference on Empirical Methods in Natural
Language Processing, pages 931–936.
OpenAI (2024). Gpt-4 technical report.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1999).
The pagerank citation ranking : Bringing order to the
web. In The Web Conference.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S.,
Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020a).
Exploring the limits of transfer learning with a unified
text-to-text transformer. Journal of Machine Learning
Research, 21(140):1–67.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S.,
Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020b).
Exploring the limits of transfer learning with a unified
text-to-text transformer. Journal of Machine Learning
Research, 21(140):1–67.
Scialom, T. et al. (2020). Mlsum: Multilingual summariza-
tion dataset. In Proceedings of the 2020 Conference
on Empirical Methods in Natural Language Process-
ing, pages 2146–2161.
Straka, M., Mediankin, N., Kocmi, T.,
ˇ
Zabokrtsk
´
y, Z.,
Hude
ˇ
cek, V., and Haji
ˇ
c, J. (2018). SumeCzech: Large
Czech news-based summarization dataset. In Pro-
ceedings of the Eleventh International Conference on
Language Resources and Evaluation (LREC 2018),
Miyazaki, Japan. European Language Resources As-
sociation (ELRA).
Straka, M. and Strakov
´
a, J. (2018). Rougeraw: Language-
agnostic evaluation for summarization. Proceedings
of the International Conference on Computational
Linguistics.
Vaswani, A., Bengio, S., Brevdo, E., Chollet, F., Gomez,
A. N., Gouws, S., Jones, L., Kaiser, L., Kalchbrenner,
N., Parmar, N., Sepassi, R., Shazeer, N., and Uszkor-
eit, J. (2018). Tensor2tensor for neural machine trans-
lation. CoRR, abs/1803.07416.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L. u., and Polosukhin,
I. (2017). Attention is all you need. In Guyon,
I., Luxburg, U. V., Bengio, S., Wallach, H., Fer-
gus, R., Vishwanathan, S., and Garnett, R., editors,
Advances in Neural Information Processing Systems,
volume 30. Curran Associates, Inc.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue,
C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtow-
icz, M., Davison, J., Shleifer, S., von Platen, P., Ma,
C., Jernite, Y., Plu, J., Xu, C., Scao, T. L., Gugger,
S., Drame, M., Lhoest, Q., and Rush, A. M. (2020).
Transformers: State-of-the-art natural language pro-
cessing. In Proceedings of the 2020 Conference on
Empirical Methods in Natural Language Processing:
System Demonstrations, pages 38–45, Online. Asso-
ciation for Computational Linguistics.
Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou,
R., Siddhant, A., Barua, A., and Raffel, C. (2021a).
mC4: A massively multilingual cleaned crawl corpus.
In Proceedings of the 2021 Conference on Empirical
Methods in Natural Language Processing (EMNLP),
pages 7517–7532, Online and Punta Cana, Dominican
Republic. Association for Computational Linguistics.
Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou,
R., Siddhant, A., Barua, A., and Raffel, C. (2021b).
mT5: A massively multilingual pre-trained text-to-
text transformer. In Toutanova, K., Rumshisky,
A., Zettlemoyer, L., Hakkani-Tur, D., Beltagy, I.,
Bethard, S., Cotterell, R., Chakraborty, T., and Zhou,
Y., editors, Proceedings of the 2021 Conference of the
North American Chapter of the Association for Com-
putational Linguistics: Human Language Technolo-
gies, pages 483–498, Online. Association for Compu-
tational Linguistics.
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., and Hovy,
E. (2016). Hierarchical attention networks for docu-
ment classification. In Proceedings of the 2016 Con-
ference of the North American Chapter of the Associa-
tion for Computational Linguistics: Human Language
Technologies, pages 1480–1489.
Zhang, J., Zhao, Y., Saleh, M., and Liu, P. J. (2019). Pe-
gasus: Pre-training with extracted gap-sentences for
abstractive summarization.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
804