This paper presents a comprehensive description of the GIDOC prototype, with spe-
cial emphasis on parts not previously described [4,7, 8]. After an overview of GIDOC
in Section 2, its main functions are described in Sections 3 (block and line detection),
4 (HTK training) and 5 (transcription). Experiments are reported in Section 6, and con-
clusions are discussed in Section 7.
2 System Overview
As indicated by its name, GIDOC has been implemented on top of the well-known
GNU Image Manipulation Program (GIMP). As GIMP, GIDOC is licensed under the
GNU General Public License, and it can be downloaded from [6]. To run GIDOC, we
must first run GIMP and open a document image. GIMP will come up with its high-end
user interface, which is often configured to only show the main toolbox (with docked
dialogs) and an image window. GIDOC can be accessed from the menubar of the image
window (see Fig. 1).
As shown in Fig. 1, the GIDOC includes six entries: Advanced options, 0: Prefer-
ences, 1: Block Detection, 2: Line Detection, 3: HTK Training, and 4: Transcription.
Advanced options is a second-level menu where experimental features are grouped.
Preferences opens a dialog to configure global options, as well as more specific options
for preprocessing, training and recognition. Some of them are discussed below together
with menu entries after Preferences.
3 Block and Line Detection
During its development, GIDOC has been mainly tested on a old book in which most
pages only contain nearly calligraphed text written on ruled sheets of well-separated
lines, as in the example shown in Fig. 1. As said in the introduction, GIDOC is de-
signed to work with such homogeneous documents and, indeed, it takes advantage of
their homogeneity. In particular, the Block Detection entry in the GIDOC menu uses a
novel text block detection method in which conventional, memoryless techniques are
improved with a “history” model of text block positions. Please see [4] for more infor-
mation.
Given a textual block, the Line Detection entry in the GIDOC menu detects all its
text baselines, which are marked as straight paths. The result can be clearly observed
in the example of Fig. 1. Although each baseline has handlers to graphically correct its
position, it is worth noting that the baseline detection method implemented works quite
well, at least in pages like that of the example. It is a rather standard projection-based
method [2]. First, horizontally-averaged pixel values or black/white transitions are pro-
jected vertically. Then, the resulting vertical histogram is smoothed and analyzed so as
to locate baselines accurately. Two preprocessing options are included in Preferences,
first, to decide on the histogram type (pixel values or black/white transitions), and sec-
ond, to define the maximum number of baselines to be found. Concretely, this number
is used to help the projection-based method in locating (nearly) blank lines.
83