
notation, our data set is of great value for the research
field of Digital Jewish Studies.
We have shown that applying decoration labels
to such a real-world dataset is not only resource-
intensive but also a process of very varying difficulty
and consensus. Our machine learning scenarios indi-
cate a need for distinctive, high-quality labeling data
despite the very unbalanced decoration data. Further
concepts of semiautomatic labeling might be neces-
sary to facilitate high-quality, large-scale data input
and to enable scholarly input and plausibility checks.
The evaluation shows already promising results in
terms of decoration recognition. We find it encour-
aging that smaller, yet more clear-cut labeling sets
outperform larger datasets with less careful balanc-
ing and selection. Counterintuitively, creating data for
multiple letter classes simultaneously helps with clas-
sification training of a single letter and makes training
with these data more robust. Overall, this encourages
working in the direction of high-quality labels includ-
ing detailed scholarly input and finer classification of
tagin variations to move beyond a mere glimpse on
the scribal intentions towards a more comprehensive
insight into the tradition of historical Torah scrolls.
ACKNOWLEDGEMENTS
This work was funded by the German Federal Min-
istry of Education and Research (BMBF) under Grant
No. 01UL2202B and supported by the Helmholtz
Association Initiative and Networking Fund on the
HAICORE@KIT partition.
REFERENCES
Benaissa, A., Bahri, A., El Allaoui, A., and Bourass, Y.
(2022). Character recognition using pre-trained mod-
els and performance variants based on datasets size:
A survey. ITM Web Conf., 43:01008.
Chollet, F. (2017). Xception: Deep learning with depthwise
separable convolutions. In 2017 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 1800–1807.
Droby, A., Kurar Barakat, B., Shapira, D., Rabaev, I., and
El-Sana, J. (2021). Vml-hp: Hebrew paleography
dataset. In Document Analysis and Recognition – IC-
DAR 2021, pages 205–220.
Droby, A., Rabaev, I., Shapira, D., Kurar Barakat, B.,
and El-Sana, J. (2022). Digital hebrew paleography:
Script types and modes. Journal of Imaging, 8(5).
Faigenbaum-Golovin, S., Shaus, A., and Sober, B. (2022).
Computational handwriting analysis of ancient he-
brew inscriptions—a survey. IEEE BITS the Informa-
tion Theory Magazine, 2(1):90–101.
Kiessling, B., Tissot, R., Stokes, P., and St
¨
okl Ben Ezra,
D. (2019). escriptorium: An open source platform for
historical document analysis. In 2019 International
Conference on Document Analysis and Recognition
Workshops (ICDARW), volume 2, pages 19–19.
Kurar Barakat, B., El-Sana, J., and Rabaev, I. (2019). The
pinkas dataset. In 2019 International Conference on
Document Analysis and Recognition (ICDAR), pages
732–737.
Michaels, M. (2020). Sefer Tagin Fragments from the Cairo
Genizah: A Critical Edition, Commentary and Recon-
struction. Cambridge Genizah Studies Series, Volume
12. Brill, Leiden, The Netherlands.
Perani, M. (2022). Chapter 11 The Tagin: Their Origin,
Use, and Oscillating Evolution between Embellish-
ment and Mystical Signifier. New Light from the An-
cient Bologna Sefer Torah, pages 297 – 348. Brill,
Leiden, Niederlande.
Prebor, G. (2024). From digitization and images to text and
content: Transkribus as a case study. Manuscript Stud-
ies, 9(1):72–89.
Rabaev, I., Kurar Barakat, B., Churkin, A., and El-Sana, J.
(2020). The hhd dataset. In 2020 17th International
Conference on Frontiers in Handwriting Recognition
(ICFHR), pages 228–233.
Saeed, E. A., Jasim, A. D., and Malik, M. A. A. (2024).
Hebrew letters detection and cuneiform tablets classi-
fication by using the yolov8 computer vision model.
eprint arXiv.
Smith, D. A., Cordel, R., Dillon, E. M., Stramp, N., and
Wilkerson, J. (2014). Detecting and modeling local
text reuse. In IEEE/ACM Joint Conference on Digital
Libraries, pages 183–192.
Snydman, S., Sanderson, R., and Cramer, T. (2015). The
international image interoperability framework (iiif):
A community & technology approach for web-based
images. Archiving Conference, 12(1):16–21.
St
¨
okl Ben Ezra, D., Brown-DeVost, B., Jablonski, P.,
Kiessling, B., Lolli, E., and Lapin, H. (2021a). Biblia
– an open annotated dataset.
St
¨
okl Ben Ezra, D., Brown-DeVost, B., Jablonski, P., Lapin,
H., Kiessling, B., and Lolli, E. (2021b). Biblia - a gen-
eral model for medieval hebrew manuscripts and an
open annotated dataset. In Proceedings of the 6th In-
ternational Workshop on Historical Document Imag-
ing and Processing, HIP ’21, page 61–66, New York,
NY, USA. Association for Computing Machinery.
Appendix A: Image Sources
Internal ID Library ID Library IIIF Manifest
2° Ms. theol. 1 UB Kassel https://orka.bibliothek.uni-kassel.de/viewer/api/v1/records/1337850581405/manifest/
2° Ms. theol. 303 UB Kassel https://orka.bibliothek.uni-kassel.de/viewer/api/v1/records/1314262537823/manifest/
Christ Church MS 201a Bodleian https://iiif.bodleian.ox.ac.uk/iiif/manifest/a6202c0a-5a65-4da8-a5fc-2fcdc8d8c784.json
Cod. hebr. 225 Austrian National Library https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990037225270205171- 1/manifest
Cod. hebr. 226 Austrian National Library https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990038531430205171- 1/manifest
Cod. hebr. 240 Austrian National Library https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990026665790205171- 1/manifest
Cod. Parm. 3598 Biblioteca Palatina Parma https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990001745980205171- 1/manifest
d8 Cod.hebr. 488 BSB Munich https://api.digitale-sammlungen.de/iiif/presentation/v2/bsb00151486/manifest
d13 Hs. or. 14091 Berlin State Library https://content.staatsbibliothek-berlin.de/dc/1809307333/manifest
d6 BL Add. 11828 London British Library https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990001223160205171-1/manifest
BL Or. 1085 London British Library https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990001223180205171-1/manifest
Ms. Heb. 24°9084 NLI Jerusalem https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990044275740205171-1/manifest
d15 Ms. Heb. 4°1408 NLI Jerusalem https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990000448750205171-1/manifest
d17 Ms. Heb. 4°1459 NLI Jerusalem https://iiif.nli.org.il/IIIFv21/DOCID/PNX
MANUSCRIPTS990000449050205171-1/manifest
Ms. Heb. 4°6066 NLI Jerusalem https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990025691070205171-1/manifest
d20 Ms. Heb. 4°7156 NLI Jerusalem https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990000417490205171-1/manifest
d23 Ms. Heb. 4°7247 NLI Jerusalem https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990025626850205171-1/manifest
Ms. Heb. 4°8457 Klein Charitable Foundation
Ms. Heb. 4°9859 NLI Jerusalem https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS997012714175205171-1/manifest
Ms. Hebr. 34°8421 NLI Jerusalem https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990035376570205171-1/manifest
Ms. Oct. 19 UB Frankfurt Main https://iiif.nli.org.il/IIIFv21/DOCID/PNX MANUSCRIPTS990001378080205171-1/manifest
Ms. or. fol. 1216 Berlin State Library https://content.staatsbibliothek-berlin.de/dc/666097267/manifest
Ms. or. fol. 1217 Berlin State Library https://content.staatsbibliothek-berlin.de/dc/666097291/manifest
d27 Ms. or. fol. 1218 Berlin State Library https://content.staatsbibliothek-berlin.de/dc/66609733X/manifest
Ms. or. fol. 133 Berlin State Library https://content.staatsbibliothek-berlin.de/dc/859630935/manifest
d25 Ms. or. fol. 134 Berlin State Library https://content.staatsbibliothek-berlin.de/dc/859632946/manifest
Ms. Rhineland 1217 Private collection
Towards a Dataset for Paleographic Details in Historical Torah Scrolls
933