TWO METHODS FOR FILLED-IN DOCUMENT IMAGE

IDENTIFICATION USING LOCAL FEATURES

Diego Carrion-Robles, Vicent Castello-Fos, Juan-Carlos Perez-Cortes and Joaquim Arlandis

∗

Institut Tecnol

ogic d’Inform

atica, Universitat Polit

ecnica de Val

encia, Cam

ı de Vera s/n, 46022 Val

encia, Spain

Keywords:

Document identiﬁcation, Filled-in documents, Local features, Reject option.

Abstract:

In this work, the task of document image classiﬁcation is dealt with, particularly in the case of pre-printed

forms, where a large part of the document can be ﬁlled-in with the result of a potentially very different image.

A method for the selection of discriminative local features is presented and tested along with two different

classiﬁcation algorithms. The ﬁrst one is an incremental version of the method proposed in (Arlandis et al.,

2009), based on similarity searching around a set anchor points, and the second one is based on a direct voting

scheme ((Arlandis et al., 2011)). Experiments on a document database consisting of real ofﬁce documents with

a very high variability, as well as on the NIST SD6 database, are presented. A conﬁdence measure intended

to reject unknown documents (those that have not been indexed in advance as a given document class) is also

proposed and tested.

1 INTRODUCTION

A common and practical task where a speciﬁc prob-

lem of document classiﬁcation arises is the organiza-

tion of bills, forms, invoices, legal, medical or admin-

istrative documents for processing (e.g. OCR), stor-

ing or archiving. Since these documents can be cate-

gorized (i.e. “Bill from supplier A”, “Tax form num-

ber X”, etc.), a typical classiﬁcation method could in

principle be applied, but in this case, only a part of

the document is kept the same, and the rest changes

in every instance. The conserved part can be different

from document to document and signiﬁcantly smaller

than the variable area that can be composed of large

handwritten, typed or stamped regions.

Traditional approaches of document categoriza-

tion have addressed the problem as a clustering task,

where documents having a certain degree of semantic

similarity are assigned to the same class or category.

In our case, the task is one of supervised classiﬁca-

tion, since we need to identify the class of the image

among a number of known document classes.

The use of textual data from OCR or the global

∗

Work partially supported by the Spanish MICINN

grants TIN2009-14205-C04-02 and Consolider Ingenio

2010: MIPRCV (CSD2007-00018) and by IMPIVA and the

E.U. by means of the ERDF in the context of the R+D Pro-

gram for Technological Institutes of IMPIVA network for

2011.

image structure is also not adequate in our case, since

the variable information can signiﬁcantly alter these

features.

Another classical approach relies on the segmen-

tation and the analysis of the layout, but large marks

or ﬁlled-in areas can introduce changes and errors in

that step, so we propose to use only visual features

and not the results of structural layout analysis.

A typical, document image consists of white back-

ground pixels and black foreground pixels, although

other combinations like gray-scale, colour, or com-

plex backgrounds and foregrounds can occur. The

foreground is mostly composed of text (in many cases

having different appearances like typed fonts, hand-

writing styles, case letters, bolded text, sizes, etc.), al-

though other objects like images, graphics, logos, or

frames are frequent, too. Usually, the text areas also

include background patterns interleaved, and some

background pattern can also be present in most of the

surface of a document.

In summary, a ﬁlled-in document can be seen as

an image having static (ﬁxed, pre-printed) and vari-

able contents (machine printed, handwritten, marked,

stamped, covered with adhesive labels, etc.). Under

this deﬁnition, a category or class of documents is

deﬁned as the set of images having different static

content from the other classes and a speciﬁc, approx-

imately equal, intra-class static content. The vari-

able content, as has been pointed out, can signiﬁ-

481

Carrion-Robles D., Castello-Fos V., Perez-Cortes J. and Arlandis J..

TWO METHODS FOR FILLED-IN DOCUMENT IMAGE IDENTIFICATION USING LOCAL FEATURES.

DOI: 10.5220/0003884004810487

In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods (IATMLRP-2012), pages 481-487

ISBN: 978-989-8425-98-0

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

cantly vary in size and content for different docu-

ments within a class. In Figure 1, some ﬁlled-in doc-

ument types are shown.

Given the speciﬁc nature of the task, the ap-

proaches proposed and compared in this work use lo-

cal representations to describe the document classes.

One of them is based on automatically ﬁnding a num-

ber of adequate anchor points for each class, and the

other uses a direct voting scheme of local-feature vec-

tors using a k-nearest neighbors classiﬁer. A common

step previous to both techniques is the selection of

candidate points. The experiments carried out test the

robustness of the approaches, taking into account that

no ﬁlled-in contents or representations are used in the

training phase.

2 RELATED WORK

The image features proposed in the literature of doc-

ument analysis and classﬁcation are many and very

different. Some are related to the document lay-

out, frame detection, salient visual features, character

recognition, texture primitives, shape codes, global

image transformations and projections, or semantic

block structure detection.

Within works in the domain of Information Re-

trieval, where the concept of static content against

ﬁlled-in data is not dealt with, document identiﬁca-

tion is referred to as a duplicate detection task. In

that case, the approaches focus on the correct classi-

ﬁcation in spite of differences among document in-

stances, like resolution, skew, distortions and image

quality. Speed and robustness are key elements, as

well as the ability to handle very large databases.

Most works dealing with ﬁlled-in documents are

related to form identiﬁcation. Many of them are

based on analyzing global and local structures. Struc-

tural features are usually limited to documents hav-

ing frames, cells, lines, blocks, or similar items, and

they may fail when different documents have very

similar structures. Other works rely on using char-

acter and string codes to achieve the document iden-

tiﬁcation (Sako et al., 2003), as well as, on comput-

ing pixel densities from image regions (Heroux et al.,

1998). Within form-type documents, speciﬁc applica-

tions are addresed to coupons (Nagasaki et al., 2006),

banking (Ogata et al., 2003) or business (Ting and Le-

ung, 1996) form identiﬁcation.

More recent and closely related works, are the

ones presented by Parker (Parker, 2010) and Sarkar

(Sarkar, 2006), (Sarkar, 2010). Sarkar (Sarkar, 2006)

presents a methodology to select and classify an-

chor points from document images. The anchor

Figure 1: Document examples. The ﬁrst one is a form

where the static contents encompass most of the document.

The second is a form page with a large number of cells that

can be ﬁlled-in or not. The third is a business document

with few structural patterns and static contents (located in

the header), while the variable part can cover the rest of the

image.

points selection is based on the use of thresholded

Viola&Jones rectangular salient visual features in the

luminance channel (Viola and Jones, 2001). For each

document class, a probability distribution of the list

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

482

of local features (including global location coordi-

nates) is obtained by a latent conditional indepen-

dence (LCI) model. An image is classiﬁed by match-

ing its resulting feature list to category-speciﬁc gener-

ative models by means of a maximum likelihood cri-

terion, and it is assigned to the category whose distri-

bution is closest, in the Kullback-Liebler sense, to the

empirical distribution. This correspondence is well

known in the text categorization/retrieval community

where observations are variable-length lists of words.

Recently, Sarkar (Sarkar, 2010) proposed a complete

methodology to select anchor points based on ran-

domly picked sub-images and aplying succesive re-

ﬁnements by expanding and ranking the candidates

using two alternative quality measures.

Parker (Parker, 2010) proposes and compares

three methods for selecting anchor points. The ﬁrst

is based on two criteria: “graphical action” and

intra-class distance minimization. The second and

third methods try to select the anchor points that

maximize the KL-divergence (Kullback-Leibler di-

vergence) function, a measure of the separation of

two distributions: one of the distances among anchor

points within a sample of a given document class,

and the other one of the distances from those an-

chor points to documents of different classes. Parker

claims that the performance of the proposed form

identiﬁcation system can be estimated in a theoreti-

cal way by using the KL-divergence. He shows the

results of experiments of the three methods using a

customized database of forms extracted from the IRS

(Internal Revenue Service, the revenue service of the

United States federal government), where only one

document type having ﬁlled-in data was used and ten

completed forms were used to train the system. The

main conclusion of the experiments is that the use of

inter-class information to select the anchor points of

a class improves the performance of the system (esti-

mated by means of the KL-divergence). This method

implies the use of several documents of each class to

train the system, and a high number of correlation op-

erations can be required to select anchor points to be

robust against image translations, as needed in the op-

erating phase.

3 APPROACHES

A method to select a set of potentially discriminant

reference points to be used in a document identiﬁca-

tion task is presented in section 3.1. This method has

been used to extract features and classify documents

by means of two approaches: a new incremental ver-

sion of the method proposed in (Arlandis et al., 2009),

based on the cross matching between pairs of docu-

ments (section 3.2), and the method proposed in (Ar-

landis et al., 2011), which relies on the combination

of the evidence contributed by multiple local features

and a direct voting scheme (section 3.3).

3.1 Reference Point Selection

The goal of this phase is to obtain an ordered list of

small sub-images of a ﬁxed size from the reference

image of each class. These sub-images should be rep-

resentative of that image. Thus, a selection criterion is

necessary to ensure that these sub-images are located

in the most informative regions of the reference image

in order to retain the areas with clear graphical con-

tent such as text or any other potentially discrimina-

tive pattern, avoiding uniform areas or uninformative

background regions. This decision can be made on

the basis of image contrast, or variance, or on more

complex operators, like textures, corner detection or

speciﬁc ﬁlters.

In order to avoid uniform areas, a good approx-

imation can be obtained by sorting by variance all

the possible sub-images of the desired size, possibly

using subsampling to reduce the computational cost.

The problem of using this method alone is that some

uninformative regions, like borders between very dark

and very light areas of the document, usually have a

high variance, ending up in the top positions of that

list. This kind of patterns are undesirable to be used

as discriminative local features because many differ-

ent documents are bound to have them, for instance,

borders of tables, scanning artifacts like shadows or

edges, etc.

To avoid this, a more sophisticated second pass

can be carried out, using speciﬁc features, like Har-

alick descriptors (Haralick et al., 1973). Particularly,

establishing a limit in the autocorrelation value and

the entropy difference has been found to be helpful

in eliminating the sub-images that have high variance

but a low discriminative potential. In Figure 2, some

examples of selection criteria are shown.

3.2 Approach 1: Cross Matching of

Document Pairs

This is an incremental version of the method pre-

sented in (Arlandis et al., 2009). It reduces drastically

the time needed to train a high number of document

classes of that method, particularly when the docu-

ments to be indexed are not very similar. This is be-

cause, on one hand, candidate images are pre-selected

(as explained in the former section), which increases

the likelihood of ﬁnding a discriminant feature faster.

TWO METHODS FOR FILLED-IN DOCUMENT IMAGE IDENTIFICATION USING LOCAL FEATURES

483

Figure 2: Examples of selection of candidate sub-images.

The ﬁrst ﬁgure is a detail from the original reference image

for a class. The second represents the 80x30 sub-images

that have higher variance in that portion of the reference

image. Note that most of them are in borders between dark

and light sectors. The third image represents the 80x30 sub-

images that had higher variance and also passed certain en-

tropy difference criteria.

On the other hand, in the presented method, a local

feature is selected based on its discriminative power

between pairs of images, instead of requiring to ﬁnd

it discriminitive among the rest of the classes.

Similarly to (Arlandis et al., 2009), this approach

relies on the fact that the discriminant power of a sub-

image of a ﬁxed size I

x,y

, on the reference image of

class c with respect to the reference image of another

class c

can be expressed as:

x,y

= min

−w

≤i≤w

−w

≤ j≤w

d(I

x,y

, I

x+i,y+ j

)

where d is a distance function used to measure dissim-

ilarity beteween two sub-images and w is the half size

of the search window, i.e, the matching area around

(x, y) needed to compensate for image distortions and

translations.

Then, r

x,y

represents the distance from I

x,y

to the

most similar sub-image found in an image from the

class c

around (x, y). Therefore, r is a good estimator

of the minimum distance that is expected to be found

when comparing I

x,y

to an image belonging to class

. That suggests that higher values of r

x,y

give rise to

a higher discriminant power of I

x,y

with respect to c

The set of ﬁnal δ-landmarks (or anchor points) of

a class c, L

, can be deﬁned as the set of the sub-

images I

x,y

that have a signiﬁcant dissimilarity with

respect to each and every one of the other known

classes,

= {I

x,y

> T

}, c 6= c

where the threshold T

can be empirically set. The

cardinality of this set will depend on two factors:

• A sub-image of a class may be discriminant

enough (r

x,y

> T

) with respect to more than a

class, and δ-landmarks from differents pairs of

classes can be shared.

• It is possible to enforce a minimum number of δ-

landmarks for each pair of classes by appropri-

ately tuning T

Training

To ﬁnd the sub-images that discriminate between two

given classes, the candidate features of the new class

found in the δ-landmark selection phase are tested

for minimum normalized distance in a search win-

dow (relative to the coordinates of the feature) of the

other class. If the distance found is less than a pre-

determined threshold T

, that feature is annotated as

discriminating between the two classes. Afterwards,

the roles of the classes are reversed and the process

is repeated (notice that r

6= r

). Finally, the se-

lected features are consolidated by eliminating the re-

peated ones, which happens when a candidate feature

has been found to be discriminative among more than

a pair of classes.

Test

Testing in this case is straightforward: each local fea-

ture from each training class is compared against the

test document in a search window around the feature

point coordinates to ﬁnd the minimum distance. From

this, an average distance for each class can be com-

puted. Finally, the test document is assigned to the

class that gets the minimum average distance.

3.3 Approach 2: Direct Voting Scheme

The second approach tested was proposed in (Arlan-

dis et al., 2011). In this case, the identiﬁcation of a

test document relies on the combination of the evi-

dence contributed by a high number of local features

(sub-images).

In the training phase, a high number of sub-images

from each class are selected from its reference im-

ages, and a local feature vector from each sub-image

is obtained expanding the gray values of pixels in a

row vector. The sub-image selection criterion should

retain areas with potentially discriminative content, as

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

484

explained in section 3.1. In the test phase, a high num-

ber of sub-images are also selected from an test image

and used to classify it. In this case, a higher number of

sub-images is required since the reference images are

ﬁlled-in documents, and they have more potentially

discriminative content.

The discriminative power of the local features ex-

tracted from the selected sub-images is improved by

taking into account non-local, or global, geometric in-

formation. Thus, the coordinates of each sub-image

are added as two new components to the feature vec-

tors, and properly normalized before classiﬁcation.

To tune the effect of these global features with respect

to the rest of the components, two weighting factors

(α

, α

) empirically estimated are applied.

Each vector is classiﬁed according to the k-nearest

neighbors rule, and ﬁnally, the class with the largest

number of votes is obtained. More formally, the

classiﬁcation procedure used is related to the meth-

ods often referred to as direct voting schemes (Mohr

et al., 1997). Given a prototype set representing the

reference classes, and a set of feature vectors m

, . . . , y

} extracted from a test image Y , the classi-

ﬁer can be written as a linear combination of m

clas-

siﬁers, each one from every feature vector of Y (Kit-

tler et al., 1998). The so called sum rule, often used in

practical applications, can be used to optimally clas-

sify an image Y in a class ˆw:

ˆw = argmax

1≤ j≤d

∑

i=1

P(ω

Assuming that the number of vectors of each class

in the prototype set is ﬁxed according to the a priori

probabilities of the classes, the following classiﬁca-

tion rule can be used:

ˆw = argmax

1≤ j≤d

∑

i=1

i j

where k

i j

is the number of neighbors of y

belonging

to the class ω

provided by the k-nearest neighbors

rule. That is, a class ˆw with the largest number of

votes accumulated over all vectors extracted from the

test image is selected.

Note that the selection criterion, as proposed in

section 3.3, is applied within a class. Therefore, it is

not guaranteed that similar sub-images from different

classes can be found at similar locations, although it

is expected that the probability of matching an “extra-

neous” vector (casual matchings) will be distributed

among the different classes when using a large num-

ber of vectors.

4 EXPERIMENTS

Two different sources of documents have been used

to test the accuracy of the proposed methods. The

ﬁrst is the SD6 NIST database (Dimmick and Garris,

1992) consisting of 5590 binary ﬁlled-in forms, 300

dpi, US Letter size, from 20 different classes. The

second one (IDF1) is a document collection obtained

from an actual ofﬁce setting. It consists of 683 binary

and gray-scale documents from 47 classes including

invoices, bank documents, personal documents, and a

variety of forms of different sizes and aspect ratios,

but mostly 300 dpi DIN A4 size, portrait orientation,

and various amounts of ﬁlled-in contents.

From these databases, two data sets have been

built:

• SD6. Used as a baseline for comparison against

other approaches (see (Sarkar, 2006), (Sarkar,

2010)). Experiments with different number of ref-

erence images were carried out.

• IDF1+SD6. Composed by the union of both

databases, it is used to obtain results using

the maximum number of classes available (67

classes). One reference image per class was se-

lected for the training set, while 6 to 10 images

per class were used for testing. The reference im-

ages have been checked and, if necessary, manu-

ally cleaned to remove ﬁlled-in contents.

Several preprocess techniques have been tested

with all image sets. Automatic orientation normal-

ization was applied to the documents and each image

was size normalized (to an equivalent of an A4 page at

300 dpi area) preserving its original aspect ratio. Fi-

nally, to reduce the processing time, the images were

scaled by a factor of 0.25.

The list of candidate reference points was ob-

tained and ordered as explained in section 3.1. To

select candidate sub-images, several textural features

were computed on local windows of 80 × 30 pixels.

Some tests were performed to measure the capabil-

ity of several textural features to select the most dis-

criminative sub-images, and it was found that vari-

ance, autocorrelation coeﬁcient and entropy differ-

ence worked better than the rest. The entropy differ-

ence was ﬁnally used in the experiments.

In the particular case of the Cross Matching clas-

siﬁer, to account for potential translations of the test

image relative to the reference images, a search area

of 200 × 200 pixels around the landmark coordinates

was used. The threshold T

was empirically set, and

the ﬁrst two landmarks of the candidate list having a

correlation index over T

were selected for each pair

of classes, which led to a total average number of 29.9

TWO METHODS FOR FILLED-IN DOCUMENT IMAGE IDENTIFICATION USING LOCAL FEATURES

485

(SD6) and 52.2 (IDF1+SD6) landmarks per class. Us-

ing this setting, all documents from both SD6 and

IDF1+SD6 test sets were correctly classiﬁed.

In the case of the Direct Voting Scheme classiﬁer,

a sub-sampling was applied in order to ensure that

any selected sub-image having static contents from

a test image was included in the training set. After

an empirical initial evaluation, a ﬁxed number of 300

reference points were selected from each training im-

age and 400 points from each test image. A PCA di-

mensionality reduction was applied to the local fea-

tures selected, resulting on 15-dimensional vectors.

Four nearest neighbours were considered in the sum

classiﬁcation rule described in section 3.3. A kd-tree

data structure, provided fast approximate k-nearest

neighbor search. All documents from the combined

IDF1+SD6 test set were correctly classiﬁed using two

reference images.

4.1 Reject Option

For a given test set, the distribution of the reliabil-

ity indices of well classiﬁed documents should not

overlap with the distribution of the reliabilities of mis-

classiﬁed and non-indexed documents (unknown doc-

uments). Obviously, the more separated both distribu-

tions are, the better generalization is to be expected.

Thus, 205 randomly selected document images,

mostly forms, not belonging to any of the reference

images, was collected and used as a test set of un-

known documents. The reliablity of a class for a given

document was computed as follows:

• Cross Matching classiﬁer (CM): The mean of the

correlation index obtained for all the landmarks of

the class.

• Direct Voting Scheme (DVS): The class posterior

probability provided by the sum rule classiﬁer.

Table 1 shows the results obtained on the SD6 and

IDF1+SD6 databases, along with the unknown doc-

ument set. Using the reliability indices deﬁned, the

recall at 100% precision, and the KL-divergence ob-

tained are shown, as well as the error rate (no un-

known documents considered). The KL-divergence

was computed using the abovementioned reliability

distributions. Because of the non-symetric quality of

the KL-divergence, the minimum value of the two

disssimilarity functions between both reliability dis-

tributions is shown.

On one hand, the results show that the combi-

nation of the reference point selection method used,

along with the two classiﬁers described, provided a

100% recognition rate on the two sets tested. On the

other hand, the recall, precision and KL-divergence

Table 1: Error rate, Recall at 100% precision, and KL-

divergence measured on the SD6 and IDF1+SD6 databases

for the best parameter sets.

Error Rate Recall 100% KL-diverg

CM DVS CM DVS CM DVS

SD6 0 0 100 99.8 40.8 36.7

Both 0 0 99.9 99.9 38.0 37.3

values obtained suggest that the reliability measure

provided by both classiﬁers is able to correctly rank

the known and unknown documents, and therefore,

allows the rejection of the unknown ones without sig-

niﬁcantly affect the rejection of indexed documents.

The processing speed measured on an AMD 64-

bits 4 CPU 3 GHz machine for the DVS method was

1.6 doc/s in both data sets. In the case of CM, the

speed was 0.47 doc/s for the IDF1+SD6 database and

1.02 doc/s for the SD6 database.

5 CONCLUSIONS

Two approaches to deal with the task of classifying

documents with total ﬂexibility of designs, layouts,

sizes, and amount of ﬁlled-in contents in an efﬁcient

way have been tested. A common method for select-

ing the best reference points in the document images

has been used to improve the results.

Experiments on document identiﬁcation were car-

ried out, and all the documents from both SD6 and

the combined database were correctly classiﬁed, and

good performances on the rejection rates of non-

indexed document images were also achieved. Train-

ing and test computation times were within the de-

mands of a real workﬂow in document processing.

REFERENCES

Arlandis, J., Castello-Fos, V., and P

erez-Cortes, J. C.

(2011). Filled-in document identiﬁcation using lo-

cal features and a direct voting scheme. In Vitri

J., Sanches, J. M. R., and Hern

andez, M., editors,

IbPRIA, volume 6669 of Lecture Notes in Computer

Science, pages 548–555. Springer.

Arlandis, J., Perez-Cortes, J.-C., and Ungria, E. (2009).

Identiﬁcation of very similar ﬁlled-in forms with a re-

ject option. In Document Analysis and Recognition,

2009. ICDAR ’09. 10th International Conference on,

pages 246 –250.

Dimmick, D. L. and Garris, M. D. (1992). Structured forms

database 2, nist special database 6.

Haralick, R. M., Shanmugam, K., and Dinstein, I. (1973).

Textural features for image classiﬁcation. Sys-

tems, Man and Cybernetics, IEEE Transactions on,

3(6):610 –621.

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

486

Heroux, P., Diana, S., Ribert, A., and Trupin, E. (1998).

Classiﬁcation method study for automatic form class

identiﬁcation. In Pattern Recognition, 1998. Proceed-

ings. Fourteenth International Conference on, vol-

ume 1, pages 926 –928 vol.1.

Kittler, J., Hatef, M., Duin, R., and Matas, J. (1998). On

combining classiﬁers. Pattern Analysis and Machine

Intelligence, IEEE Transactions on, 20(3):226 –239.

Mohr, R., Picard, S., and Schmid, C. (1997). Bayesian de-

cision versus voting for image retrieval. In IN PROC.

OF THE CAIP-97, pages 376–383.

Nagasaki, T., Marukawa, K., Kagehiro, T., and Sako, H.

(2006). A coupon classiﬁcation method based on

adaptive image vector matching. In Pattern Recogni-

tion, 2006. ICPR 2006. 18th International Conference

on, volume 3, pages 280 –283.

Ogata, H., Watanabe, S., Imaizumi, A., Yasue, T., Fu-

rukawa, N., Sako, H., and Fujisawa, H. (2003). Form-

type identiﬁcation for banking applications and its im-

plementation issues. In DRR’03, pages 208–218.

Parker, C. (2010). Anchor point selection by kl-divergence.

In Image Processing Workshop (WNYIPW), 2010

Western New York, pages 42 –45.

Sako, H., Seki, M., Furukawa, N., Ikeda, H., and Imaizumi,

A. (2003). Form reading based on form-type iden-

tiﬁcation and form-data recognition. In Proceedings

of the Seventh International Conference on Document

Analysis and Recognition - Volume 2, ICDAR ’03,

pages 926–, Washington, DC, USA. IEEE Computer

Society.

Sarkar, P. (2006). Image classiﬁcation: Classifying distribu-

tions of visual features. In Proceedings of the 18th In-

ternational Conference on Pattern Recognition - Vol-

ume 02, ICPR ’06, pages 472–475, Washington, DC,

USA. IEEE Computer Society.

Sarkar, P. (2010). Learning image anchor templates for doc-

ument classiﬁcation and data extraction. In Pattern

Recognition (ICPR), 2010 20th International Confer-

ence on, pages 3428 –3431.

Ting, A. and Leung, M. (1996). Business form classiﬁ-

cation using strings. In Pattern Recognition, 1996.,

Proceedings of the 13th International Conference on,

volume 2, pages 690 –694 vol.2.

Viola, P. and Jones, M. (2001). Rapid object detection us-

ing a boosted cascade of simple features. Computer

Vision and Pattern Recognition, IEEE Computer So-

ciety Conference on, 1:511–518.

TWO METHODS FOR FILLED-IN DOCUMENT IMAGE IDENTIFICATION USING LOCAL FEATURES

487