Removal of Historical Document Degradations using Conditional GANs

Veeru Dumpala

, Sheela Raju Kurupathi

, Syed Saqib Bukhari

and Andreas Dengel

1,2

University of Kaiserslautern, Germany

German Research Center for Artiﬁcial Intelligence (DFKI), Kaiserslautern, Germany

Keywords:

Historical Documents, Degradations, Document Binarization, Conditional GANs.

Abstract:

One of the most crucial problem in document analysis and OCR pipeline is document binarization. Many

traditional algorithms over the past few decades like Sauvola, Niblack, Otsu etc,. were used for binarization

which gave insufﬁcient results for historical texts with degradations. Recently many attempts have been made

to solve binarization using deep learning approaches like Autoencoders, FCNs. However, these models do

not generalize well to real world historical document images qualitatively. In this paper, we propose a model

based on conditional GAN, well known for its high-resolution image synthesis. Here, the proposed model is

used for image manipulation task which can remove different degradations in historical documents like stains,

bleed-through and non-uniform shadings. The performance of the proposed model outperforms recent state-

of-the-art models for document image binarization. We support our claims by benchmarking the proposed

model on publicly available PHIBC 2012, DIBCO (2009-2017) and Palm Leaf datasets. The main objective

of this paper is to illuminate the advantages of generative modeling and adversarial training for document

image binarization in supervised setting which shows good generalization capabilities on different inter/intra

class domain document images.

1 INTRODUCTION

Nowadays documents could be seen widely in many

areas of our daily life and take the form of jour-

nals, manuscripts, invoices, quotes, contracts, cer-

tiﬁcates etc. Many document analysis pipelines for

OCR (Bukhari et al., 2017), (Jenckel et al., 2016),

(Breuel et al., 2013), (Breuel, 2008) require binariza-

tion as an initial step for pre-processing document im-

ages. These resulting binarized images will be fur-

ther used by rest of the document analysis pipeline

to transform the degraded document image into dig-

ital text. Binarization means separation of pixel in-

tensity values into either black as a foreground or

white as a background. There exists lot of challenges

when generating the cleaner version of handwritten or

machine-printed historical degraded documents like

noise, non-uniform illumination, stains, non-uniform

shadings etc (See Figure 1). Therefore, in order to

extract the text from these noisy document images

it is very important to differentiate the background

from foreground text. In cleaned and scanned doc-

ument images, it is very simple to achieve this but

when we have noise in the documents, separating the

background from the foreground pixels is really crit-

ical to achieve. To be successful in document bina-

rization, one has to clean the historical artifacts while

preserving the most meaningful content of the docu-

ment image which can be seen as an ill-posed prob-

lem in document analysis. In this paper, we show that

the proposed model learns the historical degradations

and removes the noise while preserving most of the

relevant information.

The most commonly used binarization techniques

can be classiﬁed as global (Level Otsu, 1979), (Tens-

meyer and Martinez, 2017), local (Niblack, 1986),

(Mitianoudis and Papamarkos, 2015) and hybrid

(Biswas et al., 2014), (Zemouri et al., 2014) thresh-

olding. Global thresholding methods use a single

threshold value for the entire document image. Lo-

cal thresholding methods unlike global thresholding

divide the image into blocks and use a local threshold

value for each block of pixels. Hybrid thresholding

methods use the combination of both local and global

thresholding methods preserving the advantages of

these methods. Examples of all these methods include

Nick, Otsu, Sauvola, Niblack, Bradley, Bernsen, Lo-

cal Adaptive thresholding etc. Although such tech-

niques work well for normal degradations they fail in

some cases of historical document degradations. Each

method has its own pros and cons, we cannot claim

that single technique is best suitable for all degraded

Dumpala, V., Kurupathi, S., Bukhari, S. and Dengel, A.

Removal of Historical Document Degradations using Conditional GANs.

DOI: 10.5220/0007367701450154

In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2019), pages 145-154

ISBN: 978-989-758-351-3

145

(a) aging effect (b) bleed-through text

(e) non-uniform illumina-

tion

(f) non-uniform text

(g) palm leaf degrada-

tions

(h) scanned artifacts

(i) smudged text (j) stain

(k) textured bleed-

through

(l) clean binary document

Figure 1: Examples of machine-printed and hand-

written text historical document degradations of

DIBCO



b,c,d,e,f,h,i,j,k,l



, PHIBC





and Palm Leaf





datasets, where (l) shows the clean binarized document

image.

documents. There are also other techniques such as

fuzzy logic based (Farahmand et al., 2017), gradient

based (Pardhi and Kharat, 2017), RNNs (Westphal

et al., 2018) etc. Selecting an optimal method for dif-

ferent degradations is still an open issue.

With the advent of deep learning approaches

based on Convolution Neural Networks (CNNs) many

problems related to binarization have been easily

tackled using different architecture designs and train-

ing procedures (Tensmeyer and Martinez, 2017),

(Ayyalasomayajula et al., 2018). One such emerging

model which is being vastly studied and researched

in the recent years is Generative Adversarial Network

(GAN) (Goodfellow et al., 2014). These GANs have

been applied in large image domains for solving dif-

ferent tasks like image generation, image manipula-

tion, semantic manipulation etc,. Even though there

are quiet few attempts in the recent years, most of

the research on GANs is only restricted to natural

images (Wang et al., 2018), (Zhu et al., 2017) and

(Kim et al., 2017). In this paper, we show the usage

and advantages of conditional GANs for document

image binarization problem which can be treated or

seen as high-resolution document image manipulation

task. Major problem in applying deep learning meth-

ods to solve document binarization is to acquire clean

ground truth of degraded documents. This could be

solved mostly by using well-known document image

degradation techniques like Ocrodeg (Breuel, 2018)

and applying these degradations on already available

UW-III (Phillips, 1996) and UNLV (Rice et al., 1996)

clean datasets. Although using synthetic dataset could

vastly solve acquiring ground truths of historical de-

graded documents, this in turn can make the learning

model vulnerable to overﬁt to synthetically generated

datasets and cannot generalize well with real world

historical documents.

In this paper, we show that the proposed model

can also binarize well using synthetic datasets for

training when applied on unseen real-world document

images. The proposed conditional generative adver-

sarial model tries to learn the document degradations

by mapping 1024x1024x3 degraded color images to

1024x1024x1 gray scale images. Further, while test-

ing we binarize the resulting grayscale image using

default 127 pixel value as a threshold to get binarized

image. Based on recent approaches using global-to-

local binarization techniques (Biswas et al., 2014), we

make use of multi-resolution generator architecture

of pix2pixHD model (Wang et al., 2018) for docu-

ment image binarization. Here it has to be noted that,

the proposed model can be easily trained end-to-end

unlike other global-to-local binarization techniques.

The main idea behind using the output of the proposed

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

146

model to map to 1-channel instead of binary is that,

the grayscale representation space allows the model to

learn robustly and make decision based on the conﬁ-

dence values rather than just pixel-classiﬁcation of the

historical degraded image into foreground or back-

ground. Recent approaches using grayscale repre-

sentation of the degraded documents had shown to

improve the document image binarization on histor-

ical documents (Calvo-Zaragoza and Gallego, 2018),

(Peng et al., 2017), (Hedjam et al., 2015). Unlike the

original semantic manipulation model, our ﬁnal pro-

posed model also uses F-measure as an error func-

tion (Pastor-Pellicer et al., 2013). Finally, we show

our binarization results on publicly available histori-

cal degraded datasets DIBCO-2017 (Pratikakis et al.,

2017), PHIBC2012 (Ayatollahi and Nafchi, 2013)

and Palm Leaf (Burie et al., 2016) which depicts that

the proposed method for document image binariza-

tion outperforms recent state-of-the-art methods both

quantitatively and qualitatively.

2 RELATED WORK

Over the years, various methods have been proposed

and researched widely for document image binariza-

tion problem. From the perspective of this paper, they

can be classiﬁed into two classes: data-driven based

and heuristic based approaches. Though non-data-

driven approaches work well for normal degraded

documents they fail to achieve good binarization re-

sults on highly historically degraded documents. This

made the document analysis community to focus on

data driven approaches for document image binariza-

tion.

Majority of global and local/adaptive thresholding

methods have been proposed over past few decades to

solve binarization problem. Otsu (Level Otsu, 1979)

is one such popular global method from image pro-

cessing community, where it calculates single optimal

threshold value to convert grayscale image to binary

image. Sauvola (Sauvola and Pietik

ainen, 2000) is

also one such local method where it takes the context

of local neighborhood for binarizing documents. De-

spite that they provide good results for normal doc-

ument but they fail to supply justiﬁable output if a

document contains degradations. These methods can-

not even be acceptable in complex historical degra-

dations scenario such as smudges, bleed-throughs,

non-uniform shadings, stains etc. This made the re-

searchers of document analysis and recognition com-

munity to focus on local thresholding approaches.

They can vary from simple window-based techniques

to pixel level classiﬁcation.

From the progress of deep learning on differ-

ent Computer Vision tasks, successful methods have

been acquired and adapted for documents. SAE

(Calvo-Zaragoza and Gallego, 2018) uses convolu-

tional auto-encoder where the output activations indi-

cates the likelihood of a pixel to be either foreground

or background. (Peng et al., 2017) is another such

encoder-decoder network. PDNet (Ayyalasomayajula

et al., 2018) is based on network architecture that uses

FCNs with an unrolled primal-dual network. Though

these models outperform other hand-crafted or non-

data driven models, the results from these networks

are still qualitatively low. Going from global to lo-

cal, these models lose the global information which

could sometimes be useful to make predictions of a

pixel into foreground or background (See Figure 4).

This could be seen as the Global-to-Local general-

ization problem. For the past few years, researchers

proposed global-to-local binarization approaches that

make use of pixel information both globally and lo-

cally to threshold the degraded documents. Despite

these binarization algorithms use local and global fea-

tures they are still far from generalizing well to differ-

ent inter class domain degraded documents. With the

success of high-resolution image synthesis using con-

ditional generative adversarial networks (Wang et al.,

2018), (Gulrajani et al., 2017), we make our proposed

model to use similar network architecture for doc-

ument image binarization. Here, the primary goal

of generator is to provide binary result of the input

degraded document. We also show that, our model

trained on synthetic datasets could even generalize

well to real world historical document datasets. The

main idea behind using GANs for document image

binarization task is that the generator does not see the

binary version of the corresponding degraded docu-

ment rather learns to differentiate between good and

bad binarization images with the help of multi-scale

discriminators. Incorporating additional losses like

F-Measure loss (Pastor-Pellicer et al., 2013) in the

GAN objective function will also provide better gra-

dient ﬂow and faster convergence during training.

With the help of DIBCO contests (Gatos et al.,

2009), (Ntirogiannis et al., 2014) and (Pratikakis

et al., 2017), we had the chance to benchmark many

different binarization approaches on a single scale

using widely accepted evaluation metrics like Pre-

cision, Recall, F-Measure, pseudo-F-Measure, Peak

Signal-to-Noise Ratio (PSNR) and Distance Recipro-

cal Distortion (DRD). As we can see from the recent

approaches, one measure alone will not provide in-

formation about how well the binarization algorithm

works. By this we can say that the deﬁned models

should not only provide better quantitative results but

Removal of Historical Document Degradations using Conditional GANs

147

also qualitative as this could improve the performance

of overall document image analysis pipeline systems.

3 DATASETS

As we know that, the dataset plays a critical role in

training the deep learning models which in turn inﬂu-

ences these overall performance of the models. The

vital task of training deep models is to create sufﬁ-

cient amount of training data so that, the model could

learn efﬁciently. In this paper, we present various syn-

thetically generated, publicly available and real-world

datasets like UW-III, PHIBC, DIBCO and Palm Leaf

which vary in their sizes (ranging from 300x400 to

2500x3300), fonts, styles and have concrete degra-

dations which may be due to aging effects, bleed-

throughs and physical damages because of corrosion

and ﬁre. One of the challenges of historical document

image binarization is to gather the ground truth. But,

this could be solved partly by using publicly available

libraries such as Ocrodeg (Breuel, 2018) or by sim-

ply applying alpha-channel blending on clean doc-

ument images. For creating synthetic dataset (See

Figure 2), we have used UW-III dataset that con-

tains 1600 document images which are clean from the

perspective of Optical Character Recognition (OCR)

pipeline. We have applied various Ocrodeg degrada-

tions and alpha-channel blending for creating bleed-

through degraded images. The total of 1500 corre-

sponding images are used for training and the rest for

testing. Even with manually generated degraded doc-

uments, we could not attain few of the most challeng-

ing historical degradations such as non-uniform shad-

ing, smudges and uneven pen strokes etc. Therefore,

we used publicly available datasets of DIBCO from

2009 to 2016, PHIBC2012, Palm Leaf for training

and benchmarked the results on DIBCO-2017 dataset.

4 DEEP LEARNING MODELS

Today, deep learning models are used to solve vari-

ous problems in the ﬁelds of Computer Vision, Image

Processing, Robotics, Social Networking, Astronomy

etc. Most commonly used deep learning architectures

are AlexNet, ResNet, Google Net etc., We have used

Generative Adversial Network architecture to solve

our problem related to document degradation. GANs

(Goodfellow et al., 2014) are special class of artiﬁcial

intelligence algorithms which consists of mainly two

neural networks Generator and Discriminator. Gen-

erator network is responsible for generating the syn-

thetic instances from random noise or conditioned on

the input image. Discriminator is used for evaluating

the synthetic instances by minimizing the loss to its

original input. Both generator and discriminator net-

works compete with each other to minimize the losses

such that synthesized data is as similar as real data.

We can say that these networks model and learn to

mimic the data distribution.

We have seen many applications of generative ad-

versarial networks in which one of them is to take se-

mantic label maps as input and generate the photo-

realistic images. For our problem, we have chosen

Image-to-Image translation GANs which take the in-

put from one domain and translate it into another do-

main. So, the degraded documents are converted to

clean binarized documents.

4.1 Conditional GANs

The objective of Conditional GANs is to model the

conditional distribution of real images given the de-

graded images. pix2pix (Isola et al., 2017) method

uses U-Net as the generator G and a patch-based

fully convolutional network as the discriminator D.

We need to have the corresponding clean and de-

graded document image pairs in the supervised set-

ting. But the main drawback of this model was that

it couldn’t generate high-resolution images with good

quality, it looses the ﬁner details and also the train-

ing was unstable. So, to overcome the disadvan-

tages of pix2pix model we have used modiﬁed model

pix2pixHD (Wang et al., 2018) which consists of a

coarse-to- ﬁne generator, a multi-scale discriminator,

and a robust adversarial learning objective function.

4.2 Proposed Model

With the inspiration from high resolution image syn-

thesis, we propose document image binarization tech-

nique which uses the previously stated architecture as

the baseline model.

4.2.1 Architectural Details

The building block of proposed binarization frame-

work for generating cleaned version of degraded doc-

uments is an auto-encoder with residual blocks. As

stated by the baseline model, we make use of coarse-

to-ﬁne generator architecture to achieve better bina-

rization results even on high resolution degraded doc-

ument images. From the evaluation point of view, we

make a statement that some of the publicly available

datasets can only obtain best accuracy when the lo-

cal receptive ﬁeld of model is 256x256. Because of

this problem, there were more generalization errors

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

148

(a) clean UW-III document patch (b) alpha-channel blending on clean patch

Figure 2: Examples of synthetically generated dataset from UW-III (Phillips, 1996), alpha-channel blending and Ocrodeg

(Breuel, 2018) document degradations.

in the recent state-of-the-art methods when tested on

other inter domain datasets. However, we can solve

this problem by using incremental training.

We propose conditional generative adversarial

network that contain generator and discriminator

modules. The generator module can be further clas-

siﬁed into 2 sub-networks {G1} (See Figure 3) and

{G2} (See Figure 4) where both are based on auto-

encoder architecture with residual blocks (Johnson

et al., 2016) where one works on top of the other ex-

cept that the local receptive ﬁeld of two sub-networks

varies from 512 and 1024 respectively. This topology

of generator for document image binarization is con-

sidered to not only work with lower resolution doc-

uments but also on high resolution historically de-

graded images.

Similar to generator module, the discriminator

module consists of multi-scale discriminators where

each discriminator works at different scale. For gener-

ator sub-module {G1}, we use 2 discriminators where

one works at model resolution and other works at by

a factor of 2. The whole generator i.e., {G1, G2}

for generative adversarial framework uses 3-scale dis-

criminators by factor of 2 and 4. We still downsam-

ple the real and synthesized cleaned versions of de-

graded documents before giving them to multi-scale

discriminators to differentiate between real and fake

degradations-free document images. This allows the

generator to learn the historical document degrada-

tions at different scales and efﬁciently generate better

binary image.

4.2.2 Model Extension

The proposed model based on conditional GAN

framework has input and output resolutions

1024x1024x3 and 1024x1024x1 respectively.

But still, the model output is a gray-scale channel

image. So, we perform the global-thresholding on

the resulted gray-scale image with ﬁxed global value

of 127. This is to make sure that the our model learns

to differentiate between foreground and background

pixel more robustly.

4.2.3 F-Measure as Loss Function L

f −m

Recently with advances in Generative Adversarial

Framework, we see many attempts in deﬁning the

GAN value objective function with different losses.

Here, the different loss functions are used to tackle

different optimization problems. The baseline model

pix2pixHD not only uses GAN loss but also feature

matching loss and VGG loss. In the proposed work,

as we are using the GAN framework for document

image binarization so we replace the VGG loss with

f-measure error function, which previously shown

to obtain better binarization results on challenging

Removal of Historical Document Degradations using Conditional GANs

149

Figure 3: Global network with Convolution Layers, Residual Blocks and Transpose-Convolution Layers as Generator {G1}

and Multi-Scale Discriminators D1 and D2, which work at 512 X 512 and 256 X 256 respectively.

datasets (Pastor-Pellicer et al., 2013). We also show

the evaluation results with and without L

f −m

in GAN

scenario.

Overall objective function of the proposed model

can be described as below:

min

(( max

,..D

∑

i=1

GAN

(G, D

)) + λ

∑

i=1

(G, D

∑

i=1

f −m

(G, D

)) (1)

where L

is a feature matching loss and L

f −m

is a

f-measure loss.

4.2.4 Training Procedure

Initially, we train {G1} and {G2} with the help of re-

spective multi-scale discriminators in the deﬁned or-

der of their resolutions and ﬁne-tune all the networks

accordingly. By the help of this multi-resolution

pipeline, the proposed document image binarization

model works well on wide-range of historically de-

graded documents.

5 EXPERIMENTS

In this section, we show the evaluation results of

the proposed model with different conﬁgurations and

compare them with other state-of-the-art methods for

document image binarization. In 5.2.1, we provide

the model performance with and without f-measure

as error function. In 5.2.2, we show the test re-

sults of obtained generator with varying input resolu-

tions. In 5.2.3, and 5.2.4, we present both quantitative

and qualitative evaluation results on various publicly

available datasets and also on the synthetically gener-

ated dataset.

5.1 Evaluation Metrics

For benchmarking the proposed model performance,

we incorporate widely-known evaluation metrics for

document binarization from previous document im-

age binarization contests (DIBCO) like F-Measure

(harmonic mean of Precision and Recall), DRD (mea-

sures the visual distortion of binary document im-

ages), pseudo-F-Measure (harmonic mean of pseudo-

Precision and pseudo-Recall which uses weighted-

distances to GT contours) and PSNR (computes peak

signal-to-noise ratio between GT and predicted bi-

nary document images). Though these measures are

widely accepted, we like to also show the qualitative

results of generator with adversarial training that are

visually appealing compared to present state-of-the-

art approaches.

5.2 Quantitative and Qualitative

Results

5.2.1 With and Without L

f −m

As we are working on historical document images

where the ﬁnal goal of the proposed model is to output

a clean version or binary image without any degra-

dations, we make use of f-measure error function

and compare it in the generative adversarial frame-

work. From the Table 1, it is clear that the accuracy

drops by 1-2% when GAN loss and Feature matching

loss alone considered. This explains about the im-

portance of L

f −m

when building deep learning archi-

tectures for document image binarization where the

sole task could be generalized to obtain relevant (for

foreground) and non-relevant (for background) pix-

els from the degraded documents. We also provide

comparison of f-measure accuracies on various de-

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

150

Figure 4: Local enhancing network with Convolution Layers, element-wise sum between intermediate G2 feature maps and

G1 last feature maps, Residual Blocks and Transpose-Convolution Layers as Generator {G1, G2} and Multi-Scale Discrimi-

nators D1, D2 and D3, which work at 1024 X 1024, 512 X 512 and 256 X 256 respectively.

ﬁned datasets (Calvo-Zaragoza and Gallego, 2018) in

the Table 2. In the case of DIBCO datasets (D14 and

D16), we use all the images from the rest of the con-

test editions as training set. PL-I and PL-II are already

provided with train and test partitions. While in the

case of PHIBC, we randomly split the corpora into

train and test partitions, with 80% and 20% respec-

tively. It is clear that the model trained on PHIBC, PL-

I and PL-II with L

f −m

generalizes well with D14 and

D16 test sets. This shows the better generalization

advantage of adversarial training over other state-of-

the-art end-to-end learning approaches. For the Palm-

Leaf datasets, we show {G1} and {G1, G2} results

separately because the aspect ratio of the PL-I and

PL-II datasets varies substantially compared to other

datasets.

Table 1: Comparison of our proposed model with and with-

out L

f −m

on DIBCO and PHIBC datasets based on F-

Measure accuracy.

Trained on Tested on DIBCO-2017

DIBCO 2009-2016 w and w/o L

f −m

89.2 / 87.1

PHIBC w and w/o L

f −m

83.8 /82.77

5.2.2 {G1} and {G1, G2} Evaluation Results

From Table 3, we can see that the proposed model

alone with {G1} outperforms the DIBCO 2017

benchmark challenge winner (Pratikakis et al., 2017)

which is based on U-Net convolutional architecture

by 0.5% in accuracy. Here, the training set consists of

previous DIBCO contests datasets from 2009 to 2016

and tested on DIBCO 2017. From Figure 5 and Fig-

ure 6, it should also be noted that the obtained results

not only outperform the other method quantitatively

but also qualitatively. However, increasing the model

size by ﬁne-tuning with G2 made the model predic-

tion to drop the f-measure accuracy by 1.42%. But

the qualitative results are still better in comparison.

5.2.3 On Synthetically Generated Data and

Manually Collected Dataset

Here, we compare the proposed model which is

trained on synthetically generated degraded docu-

ments using libraries like Ocrodeg on UW-III and

UNLV datasets with percentile based method (Afzal

et al., 2013), which is a robust non-deep learning-

based approach for document image binarization. We

even achieve 80.7% f-measure accuracy on the most

challenging DIBCO 2017 binarization dataset (see

Table 4). This implies that the model trained in adver-

sarial manner provides less generalization error and

avoids over-ﬁtting to the training set i.e., in this case

to synthetically generated degraded dataset. It should

be noted that the model used for the evaluation do not

incorporate L

f −m

while training. But with the inclu-

sion to full objective function, we obtained f-measure

of 81.2%.

6 CONCLUSION

In this paper, we presented the conditional generative

model that exploits the power of deep neural networks

for removing various challenging degradations from

historical documents like stains, bleed-through, non-

uniform shadings etc,. to obtain high-resolution bina-

rized result. Here, we also observed that integrating

f −m

to the objective function enhanced the learning

behaviour of the model. Without much data augmen-

tation (we used only horizontal ﬂips), the proposed

Removal of Historical Document Degradations using Conditional GANs

151

(a) Source (b) state-of-the-art

(e) Ground Truth

Figure 5: Qualitative Results of the proposed model which is trained on DIBCO 2009-2016 and evaluated on DIBCO-2017

Handwritten Text outperforms recent state-of-the-art (Pratikakis et al., 2017).

(a) Source (b) state-of-the-art (c) our {G1} (d) our {G1, G2} (e) Ground Truth

Figure 6: Qualitative Results of the proposed model which is trained on DIBCO 2009-2016 and evaluated on DIBCO-2017

Machine-Printed Text outperforms recent state-of-the-art (Pratikakis et al., 2017).

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

152

Table 2: Proposed model trained and tested on deﬁned datasets (Calvo-Zaragoza and Gallego, 2018). Asterisk (*) indicating

better performance over other method.

Train Test

D14 D16 PL-I PL-II PHIBC Avg

D14 92.4 - 49.72 49.41 89.32 70.21

D16 - 85.2 46.09 45.78 89.11

∗

66.55

PL-I 55.46

∗

50.05

∗

49.81 49.9 60.66 53.18

PL-I (only {G1}) 57.56

∗

46.5

∗

57.6 57.6 37.67 51.39

PL-II 57.7

∗

54.59

∗

53.06 53.2 54.48 54.61

PL-II (only {G1}) 60.28

∗

55.77

∗

57.18 57.27 48.60 55.82

PHIBC 90.18

∗

85.81

∗

52.4

∗

52.53

∗

87.73 73.73

Table 3: Quantitative results of {G1} and {G1, G2} which

are trained on DIBCO 2009-2016 and evaluated on DIBCO-

2017 outperforms recent state-of-the-art (Pratikakis et al.,

2017) in document image binarization.

Metric only {G1} {G1, G2} state-of-the-art

F-Measure 91.53

∗

90.11 91.04

pseudo-F-Measure 94.2

∗

91.72 92.86

DRD 2.820

∗

3.803 3.40

PSNR 18.241

∗

17.544 18.28

Table 4: F-Measure results of our proposed model trained

on synthetic dataset with and without L

f −m

and compared

to percentile based method.

Method Tested on DIBCO 2017

Ours + w and w/o L

f −m

81.2 / 79.9

(Afzal et al., 2013) 82.3

model converges and also generalizes well to never

seen data. We showed that our model outperforms

recent state-of-the-art for document image binariza-

tion by providing benchmark results on publicly avail-

able and manually acquired datasets. From our ex-

periments, we conclude that the previous state-of-

the-art models which were trained end-to-end with-

out coarse-to-ﬁne architecture are prone to a prob-

lem which we state that as Global-to-Local general-

ization problem. We also presented results which de-

pict that our model has qualitative improvement over

other methods. We have exhibited the pros of genera-

tive modeling with adversarial training for document

image binarization in supervised and incremental set-

ting which provides good generalization capabilities

on different inter and intra class domain document

images. In the future work we would like to further

improve the model efﬁciency by optimizing its archi-

tecture. Although signiﬁcant work has been done on

using deep learning architectures for document bina-

rization there is still lot to be explored where these

models can be adapted for solving several handwrit-

ten or machine-printed document analysis and recog-

nition problems.

REFERENCES

Afzal, M. Z., Kr

amer, M., Bukhari, S. S., Youseﬁ, M. R.,

Shafait, F., and Breuel, T. M. (2013). Robust bina-

rization of stereo and monocular document images us-

ing percentile ﬁlter. In International Workshop on

Camera-Based Document Analysis and Recognition,

pages 139–149. Springer.

Ayatollahi, S. M. and Nafchi, H. Z. (2013). Persian heritage

image binarization competition (phibc 2012). arXiv

preprint arXiv:1306.6263.

Ayyalasomayajula, K. R., Malmberg, F., and Brun, A.

(2018). Pdnet: Semantic segmentation integrated with

a primal-dual network for document binarization. Pat-

tern Recognition Letters.

Biswas, B., Bhattacharya, U., and Chaudhuri, B. B. (2014).

A global-to-local approach to binarization of degraded

document images. In Pattern Recognition (ICPR),

2014 22nd International Conference on, pages 3008–

3013. IEEE.

Breuel, T. M. (2008). The ocropus open source ocr system.

In Document Recognition and Retrieval XV, volume

6815, page 68150F. International Society for Optics

and Photonics.

Breuel, T. M. (2018). Document image degradation for data

augmentation for handwriting recognition and ocr ap-

plications. https://github.com/NVlabs/ocrodeg. Ac-

cessed: 2018-05-30.

Breuel, T. M., Ul-Hasan, A., Al-Azawi, M. A., and Shafait,

F. (2013). High-performance ocr for printed english

and fraktur using lstm networks. In Document Analy-

sis and Recognition (ICDAR), 2013 12th International

Conference on, pages 683–687. IEEE.

Bukhari, S. S., Kadi, A., Jouneh, M. A., Mir, F. M., and

Dengel, A. (2017). anyocr: An open-source ocr sys-

tem for historical archives. In Document Analysis and

Recognition (ICDAR), 2017 14th IAPR International

Conference on, volume 1, pages 305–310. IEEE.

Burie, J.-C., Coustaty, M., Hadi, S., Kesiman, M. W. A.,

Ogier, J.-M., Paulus, E., Sok, K., Sunarya, I. M. G.,

and Valy, D. (2016). Icfhr2016 competition on the

analysis of handwritten text in images of balinese

palm leaf manuscripts. In Frontiers in Handwriting

Recognition (ICFHR), 2016 15th International Con-

ference on, pages 596–601. IEEE.

Calvo-Zaragoza, J. and Gallego, A.-J. (2018). A selectional

Removal of Historical Document Degradations using Conditional GANs

153

auto-encoder approach for document image binariza-

tion. Pattern Recognition.

Farahmand, A., Sarrafzadeh, H., and Shanbehzadeh, J.

(2017). Noise removal and binarization of scanned

document images using clustering of features.

Gatos, B., Ntirogiannis, K., and Pratikakis, I. (2009). Ic-

dar 2009 document image binarization contest (dibco

2009). In Document Analysis and Recognition, 2009.

ICDAR’09. 10th International Conference on, pages

1375–1382. IEEE.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A., and Ben-

gio, Y. (2014). Generative adversarial nets. In

Advances in neural information processing systems,

pages 2672–2680.

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and

Courville, A. C. (2017). Improved training of wasser-

stein gans. In Advances in Neural Information Pro-

cessing Systems, pages 5767–5777.

Hedjam, R., Nafchi, H. Z., Kalacska, M., and Cheriet, M.

(2015). Inﬂuence of color-to-gray conversion on the

performance of document image binarization: toward

a novel optimization problem. IEEE transactions on

image processing, 24(11):3637–3651.

Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017).

Image-to-image translation with conditional adversar-

ial networks. arXiv preprint.

Jenckel, M., Bukhari, S. S., and Dengel, A. (2016). anyocr:

A sequence learning based ocr system for unlabeled

historical documents. In Pattern Recognition (ICPR),

2016 23rd International Conference on, pages 4035–

4040. IEEE.

Johnson, J., Alahi, A., and Fei-Fei, L. (2016). Perceptual

losses for real-time style transfer and super-resolution.

In European Conference on Computer Vision, pages

694–711. Springer.

Kim, T., Cha, M., Kim, H., Lee, J. K., and Kim, J.

(2017). Learning to discover cross-domain relations

with generative adversarial networks. arXiv preprint

arXiv:1703.05192.

Level Otsu, N. (1979). A threshold selection method from

gray-level histogram. IEEE Trans. Syst. Man Cybern,

9(1):62–66.

Mitianoudis, N. and Papamarkos, N. (2015). Document

image binarization using local features and gaussian

mixture modeling. Image and Vision Computing,

38:33–51.

Niblack, W. (1986). An introduction to digital image pro-

cessing, volume 34. Prentice-Hall Englewood Cliffs.

Ntirogiannis, K., Gatos, B., and Pratikakis, I. (2014).

Icfhr2014 competition on handwritten document im-

age binarization (h-dibco 2014). In Frontiers in

Handwriting Recognition (ICFHR), 2014 14th Inter-

national Conference on, pages 809–813. IEEE.

Pardhi, S. and Kharat, G. (2017). An improved binarization

method for degraded document. In National Confer-

ence MOMENTUM, volume 17.

Pastor-Pellicer, J., Zamora-Mart

ınez, F., Espa

na-Boquera,

S., and Castro-Bleda, M. J. (2013). F-measure as

the error function to train neural networks. In Inter-

national Work-Conference on Artiﬁcial Neural Net-

works, pages 376–384. Springer.

Peng, X., Cao, H., and Natarajan, P. (2017). Using con-

volutional encoder-decoder for document image bina-

rization. In Document Analysis and Recognition (IC-

DAR), 2017 14th IAPR International Conference on,

volume 1, pages 708–713. IEEE.

Phillips, I. (1996). Users reference manual for the uw en-

glish/technical document image database iii. UW-III

English/Technical Document Image Database Man-

ual.

Pratikakis, I., Zagoris, K., Barlas, G., and Gatos, B. (2017).

Icdar2017 competition on document image binariza-

tion (dibco 2017). In 2017 14th IAPR International

Conference on Document Analysis and Recognition

(ICDAR), pages 1395–1403. IEEE.

Rice, S. V., Jenkins, F. R., and Nartker, T. A. (1996). The

ﬁfth annual test of OCR accuracy. Information Sci-

ence Research Institute.

Sauvola, J. and Pietik

ainen, M. (2000). Adaptive document

image binarization. Pattern recognition, 33(2):225–

236.

Tensmeyer, C. and Martinez, T. (2017). Document im-

age binarization with fully convolutional neural net-

works. In Document Analysis and Recognition (IC-

DAR), 2017 14th IAPR International Conference on,

volume 1, pages 99–104. IEEE.

Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., and

Catanzaro, B. (2018). High-resolution image synthe-

sis and semantic manipulation with conditional gans.

In IEEE Conference on Computer Vision and Pattern

Recognition (CVPR), volume 1, page 5.

Westphal, F., Lavesson, N., and Grahn, H. (2018). Doc-

ument image binarization using recurrent neural net-

works. In 2018 13th IAPR International Workshop on

Document Analysis Systems (DAS), pages 263–268.

IEEE.

Zemouri, E., Chibani, Y., and Brik, Y. (2014). Enhance-

ment of historical document images by combining

global and local binarization technique. International

Journal of Information and Electronics Engineering,

4(1):1.

Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017).

Unpaired image-to-image translation using cycle-

consistent adversarial networks. arXiv preprint.

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

154