loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Aditya W Mahastama and Lucia D Krisnawati

Affiliation: Informatics Dept., Faculty of Information Technology, Universitas Kristen Duta Wacana,, Indonesia

Keyword(s): OCR, Character Segmentation, Projection Profile Cutting, Outlier Analysis

Abstract: The emergence of non-latin scripts in the Unicode character set has opened the possibilities to do Optical Character Recognition (OCR) for manuscripts written in non-alphabetic scripts. Javanese is one of the Southeast Asian languages which has vast collections of manuscripts. Unfortunately, these manuscripts are prone to damage due to lack of maintenance. Therefore, digitising them through OCR has become the most obvious option. This research focuses on the segmentation process of our OCR project which implements the Projection-Profile Cutting (PPC). The rationale is that PPC is well known as having a low computational cost. As the object of segmentation, we sampled 72 scanned pages of Serat Mangkunegara IV, Wulang Maca, and Kitab Rum. Our preliminary evaluation showed that implementing PPC per se exhibits unsatisfactory results. Hence, we refined it by applying a statistical analysis to segment lines of characters whose distance is too low. The proposed algorithm results in 19.112 segments. To evaluate the system outputs, we conducted two levels of evaluation: the line and character segmentations. The refinement of PPC has proved to increase the line segmentation accuracy by 32.84%. To evaluate the character segmentation, we collaborated with Javanese Wikipedia Community which verified them manually in 4 batches. Only 15.386 segments were verified, in which 73.59% (11.322) system outputs are correctly segmented, 22.5% (3.464) are over-segmented, 1.3% (206) are under-segmented, and the rest has not been labelled as either one of three categories above. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.226.17.210

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Mahastama, A. and Krisnawati, L. (2020). Improving Projection Profile for Segmenting Characters from Javanese Manuscripts. In Proceedings of the 1st International Conference on Intermedia Arts and Creative Technology - CREATIVEARTS; ISBN 978-989-758-430-5, SciTePress, pages 77-82. DOI: 10.5220/0008526900770082

@conference{creativearts20,
author={Aditya W Mahastama. and Lucia D Krisnawati.},
title={Improving Projection Profile for Segmenting Characters from Javanese Manuscripts},
booktitle={Proceedings of the 1st International Conference on Intermedia Arts and Creative Technology - CREATIVEARTS},
year={2020},
pages={77-82},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008526900770082},
isbn={978-989-758-430-5},
}

TY - CONF

JO - Proceedings of the 1st International Conference on Intermedia Arts and Creative Technology - CREATIVEARTS
TI - Improving Projection Profile for Segmenting Characters from Javanese Manuscripts
SN - 978-989-758-430-5
AU - Mahastama, A.
AU - Krisnawati, L.
PY - 2020
SP - 77
EP - 82
DO - 10.5220/0008526900770082
PB - SciTePress