Similarity-based Image Retrieval for

Revealing Forgery of Handwritten Corpora

Ilaria Bartolini

DISI, Universit`a di Bologna, Bologna, Italy

Keywords:

Similarity-based Image Retrieval, Low-level Features, k-Nearest Neighbor, Handwritten Corpora, Forgery.

Abstract:

Authorship attribution is a problem with a long history and a wide range of applications. Recent works in

non-traditional authorship attribution contexts demonstrate the practicality of automatic analysis of documents

based on authorial style. However, such analyses are difﬁcult to apply and few “best practices” are available. In

this paper, we show how quantitative techniques based on image similarity search can be proﬁtably exploited

for revealing forgery of handwritten corpora. More in details, we explore the case where a document is

represented by means of the image of the document itself. Preliminary experimental results conducted on real

data demonstrate the effectiveness of the proposed approach.

1 INTRODUCTION

Authorship attribution is the science of inferring iden-

tity of an author from the characteristics of documents

written by the author itself. It represents a problem

with a long history and a wide range of applications.

Recent works in non-traditional authorship attribu-

tion domains demonstrate the practicality of auto-

matic analysis of documents based on authorial style.

Such analyses are however difﬁcult to apply, little is

known about types or rates of error, and few “best

practices” are available (David and Karl, 2014).

The speciﬁc type of automatic document analy-

sis depends on the meaning we convey to the word

“document”. The variance of this concept, in fact,

implies that different methods could be exploited for

automatically managing corpora with the ﬁnal aim to

reveal forgery. Computer scientists, mathematicians,

philologists, quantitative linguists and digital human-

ists have different points of view on what a document

is. This entails different approaches that could be con-

structively integrated in order to help with complex

problems such as forgery (Tomasi et al., 2013).

In the following, we will focus on how to ex-

ploit traditional computer science practices in order

to reveal forgery of handwritten corpora and show

how quantitative techniques based on image similar-

ity search can be proﬁtably exploited for this purpose.

More in details, we explore the case where a hand-

written document is represented by means of the im-

age of the document itself, that is, the digital represen-

tation of a manuscript page. The reference problem

is thus turned into a similarity-based image retrieval

one (Smeulders et al., 2000).

We demonstrate how the automatic characteriza-

tion of the content of document pages, in terms of

“low-level features”, can effectively contribute to au-

thorship attribution. In doing this, we need to inves-

tigate a set of relevant related issues. In particular,

(1) Which low-level features should be used for rep-

resenting pages? (2) How to compare such automat-

ically extracted features? (3) How to asses if two

pages are “(dis)similar”, thus establishing whether a

manuscript corpus is authentic or not with respect to

a speciﬁc author?

In tackling above issues, we also need to consider

that similarity could not be always a satisfactory cri-

terion in order to attribute authorship in the case of

suspected forgery: it is not surprising that a forgery

is similar to the author’s work. The problem is how

to verify if a document is too similar to the author’s

work, and if such types of similarity cannot be found

elsewhere in the remaining corpus.

Further,we haveto consider that manuscript pages

could be analyzed at different levels of granularity,

each one deﬁning an image element: syllables, words,

sentences, paragraphs, whole pages, etc.

From each element, visual characteristics, able to

deﬁne speciﬁc graphic aspects of an author (thus, able

to differentiate her/his handwriting), are extracted.

Among them, well known examples coming from the

traditional paleography context are leading, writing

104

Bartolini I..

Similarity-based Image Retrieval for Revealing Forgery of Handwritten Corpora.

DOI: 10.5220/0005564401040112

In Proceedings of the 12th International Conference on Signal Processing and Multimedia Applications (SIGMAP-2015), pages 104-112

ISBN: 978-989-758-118-2

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

body, writing angle, density, word’sspacing, and duc-

tus. Although effective, however, all such features

represent a completely manual solution to the prob-

lem, since this requires human experts, possibly sup-

ported by some image editing tools (David and Karl,

2014), to extract them.

In order to automate the management of handwrit-

ten corpora, we propose a completely automatic rep-

resentation of elements based on the notion of hand-

writing shape. To model the writing shape, a set of

effective visual characteristics (called local features)

are extracted from each element using speciﬁc image

analysis techniques like, for example, SIFT (Lowe,

1999) and SURF (Bay et al., 2008). So-obtained lo-

cal features are then compared in order to establish

their degree of (dis)similarity, with the ﬁnal aim to

establish whether a corpus is authentic or not with

respect to a speciﬁc author. This implies a pre-

processing phase where the analysis of some hand-

written pages of authentic writings is executed in

order to build a “ground truth” reference informa-

tion for comparing suspicious writings to the authen-

tic ones. Given an input page, composed by a set

of target elements, and an element distance function

that measures the (dis)similarity of a given pair of

elements using their local features, we want to de-

termine automatically if the query manuscript page

could be considered authentic with respect to a spe-

ciﬁc author. The (dis)similarity between pages is nu-

merically assessed by way of a page distance func-

tion that somehow “combines” the single element dis-

tances into an overall value. Preliminary experimen-

tal results conducted on a software implementation of

the proposed solution, namely WRITINGSIMILARI-

TYSEARCH (WSS), and using real data demonstrate

the effectiveness of our method and encourage further

investigations on this direction.

The rest of the paper is organized as follows: Sec-

tion 2 reports the related work; Section 3 details the

proposed similarity-based image retrieval approach.

In Section 4 we describe WSS, whereas in Section 5

we comment some preliminary experimental results

based on real document collections. Finally, Section 6

concludes the paper.

2 RELATED WORK

In this section we report state-of-the-art solutions to

authorship attribution with respect to the speciﬁc ﬁeld

of image analysis.

OCR is a traditional approach based on pattern

recognition techniques that enable a computer to read

texts (i.e., scanned images of a texts) (Bunke and

Wang, 1997). However, if this is a feasible solution

on printed text, its use for manuscripts is rather prob-

lematic. In general, OCR applied to handwritten texts

is far from being perfect because of the issue of “vari-

ations”. For example, the same letter drawn by the

same person is slightly different each time, as well

as letters drawn by different hands. These variations

make it hard for the computer to read the writing cor-

rectly and to make a successful match in the context

of authorship attribution.

Due to poor recognition results provided by OCR,

handwritten document image retrieval remains a very

challenging problem: keeping documents as image

format is a more economical and ﬂexible alternative

than converting image documents into text format by

OCR; furthermore, it is more robust for different vari-

ations and degradations (David and Karl, 2014).

In (Aiolli and Ciula, 2009), the authors propose

the tool System for Paleographic Inspections (SPI).

SPI solves the problem of variations by training and

working on prototypes of letters, i.e., collecting ab-

stracted models of a single person’s handwriting. The

prototype comes with a predeﬁned set of limits be-

tween which the letter belonging to the unidentiﬁed

document may deviate from the prototype. The main

limit of SPI is that the segmentation process focuses

on the shape of individual letters only. Thus, the over-

all appearance of the manuscript page, at the different

levels of granularity (i.e., sentences, words, etc.), and

its immediate context are completely ignored.

In (Rath et al., 2004), a probabilistic annotation

model for word matching in written documents is

presented. Word images are represented by means

of Fourier coefﬁcients. A learning model is trained

to map any given word image to a speciﬁc word

from a vocabulary with a probability. At query time,

the model estimates the probability of a query word

and a sequence of feature vector occurring together.

The method is pure text-based retrieval and achieves

multiple-words query tasks; however, it suffers from

queries which do not appear in the training set.

Finally, (Cao et al., 2011) propose an adapted vec-

tor model for word retrieval, where documents and

queries are represented by means of a vector space of

term frequency (TF) and inverse document frequency

(IDF) for each term in the vocabulary. TFs and IDFs

are estimated by means of word segmentation and

recognition likelihood. Retrieval is achieved by mea-

suring the similarity between vectors of query and

data documents with a ranked list. Similarly to (Rath

et al., 2004), also this approach is impracticable when

queries do not belong to the vocabulary.

To the best of our knowledge, WRITINGSIMI-

LARITYSEARCH is the ﬁrst attempt to provide a thor-

Similarity-basedImageRetrievalforRevealingForgeryofHandwrittenCorpora

105

ough solution to the problem of revealing forgery of

handwritten corpora based on image similarity.

3 THE SIMILARITY-BASED

IMAGE RETRIEVAL

APPROACH

In this section, we detail our content-based im-

age similarity solution to the authorship attribution

problem. In our model, each document is repre-

sented by means of the image of the document it-

self, that is, through the digital representation of

the manuscript page. By applying image analysis

methods and (dis)similarity search techniques, we

automatically characterize the content of each page

through “low-level features” and easily retrieve the

most (dis)similar pages to the target (i.e., a speciﬁc

author) one, following the k-Nearest Neighbor (k-

NN) search paradigm (Baeza-Yates and Ribeiro-Neto,

1999).

The basic idea of our approach is summarized in

Figure 1: we build a database of training features

extracted from some handwritten pages of authentic

writings (what we call “ground truth”). Then we ex-

tract the same features from a (suspicious) test page

and compute the (dis)similarity between above fea-

tures in order to establish the paternity of the test page

with respect to the target author.

User

Test image

Training image DB

Authorship attribution

based on features

similarity (e.g., writing

shape, body, angle, …)

Figure 1: The basic idea of content-based image similarity

search for authorship attribution.

Data and comparison models are inspired by the

Windsurf ones (Bartolini et al., 2011)

With respect to the data model, images are com-

posed by elements, that is, relevant parts of the hand-

written page; each element is described by means of

automatically extracted low-level features that repre-

sent, in an appropriate way, the content of the element

itself (i.e., the handwriting style of an author with re-

spect to a speciﬁc set of graphic aspects).

As for the comparison model, given an input

(query) manuscript page, composed of m relevant ele-

ments, and an element distance function d

that mea-

sures the dissimilarity of a given pair of elements us-

ing their features, we want to automatically deter-

mine if the query manuscript page could be consid-

ered authentic with respect to a speciﬁc author. The

(dis)similarity between manuscript pages is numeri-

cally assessed by means of a page distance function

d that somehow “combines” the single element dis-

tances into an overall value. The efﬁcient resolution

of comparisons over features is ensured by an index

structure (e.g., M-tree (Ciaccia et al., 1997)) built on

top of elements (for example, words).

Pragmatically, manuscript pages are ﬁrst seg-

mented in parts (e.g., syllables, words, sentences, and

also the whole page). From each element, visual

salient characteristics, able to deﬁne speciﬁc graphic

aspects, and thus differentiate the handwriting of an

author, are extracted. Image elements are then com-

pared according to their visual features according to

an ad-hoc distance metric d

. Elements scores are ﬁ-

nally appropriately matched to aggregate distance val-

ues of matched elements using the page distance func-

tion d (e.g., the average).

Among relevant visual features, we have charac-

teristics traditionally used in paleography literature

like (1) leading: vertical distance between a line and

the next one of a document page; (2) writing body:

height of the body of the letters (except ascenders and

descenders); (3) writing angle: angle of inclination of

the pen (except ascenders and descenders); (4) den-

sity: ﬁlling factor of selected area (black pixels vs.

white ones); (5) words’ spacing: horizontal distance

between two consecutive words; (6) ductus: quali-

ties and characteristics of writing instantiated in the

ﬂow of writing the text. Note that, some of the above

features refer to the whole document page (e.g., lead-

ing, density, and ductus), while others work at a ﬁner

granularity level (e.g., writing body, writing angle,

and words’ spacing). Each feature is represented as a

numerical value (1-D feature vector); comparison be-

tween elements is then assessed as the absolute value

of the difference between single feature values.

Although effective, and thus used in paleogra-

phy contexts, all above features share the limit of

requiring a manual extraction process by human ex-

perts, eventually supported by image editing tools like

Graphoskop.

In the latter case only, we can refer to

paleography features as semi-automatic ones.

To scale and integrate traditional methods, we

propose a completely automatic solution for the rep-

Graphoskop library: www.palaeographia.org/

graphoskop/

SIGMAP2015-InternationalConferenceonSignalProcessingandMultimediaApplications

106

resentation of each element based on its content (i.e.,

writing shape) description. In details, we model writ-

ing shape through a set of local features (or salient

points) automatically exacted using image analysis

techniques like, for example, SIFT (Lowe, 1999) and

SURF (Bay et al., 2008). So-obtained salient points

are then compared in order to establish their degree

of (dis)similarity, with the ﬁnal aim of establishing

whether a corpus is authentic or not with respect to a

speciﬁc author. We refer to writing shape features as

automatic features.

As above mentioned, SIFT and SURF techniques

represent the image content by means of a (large) set

of local salient points (e.g., corners, blobs, and T-

conjunctions). Each point is usually represented by

a 128-D feature vector. Similarity between images

is then assessed by matching visual characteristics of

their salient points based on Euclidean or quadratic

distance functions and aggregating (using the average

function) local scores to a global value.

Clearly, high dimensionality can be an issue work-

ing with both SIFT and SURF: e.g., 200/1000 salient

points, each represented by a 128-D vector, can be

used for representing a single image element. To

overcome this problem, approximate solutions to the

k-NN search problem in high-dimensional spaces are

applied: the “best bin ﬁrst” 1-1 matching (instead

of exact one) is adopted together with a simple Eu-

clidean distance function as similarity criterion. The

rationale of the approximate matching algorithm is to

match each salient point of the query element, start-

ing from the ﬁrst one, to the “best” (i.e., most similar)

salient point of the target DB element.

From the complexity point of view, using the “best

bin ﬁrst” approximation allows us to bound the global

cost to O(N

), denoting with N the (average) number

of salient points in an image (Rui et al., 1998) (the

complexity of 1-1 exact matching, e.g., by means of

the Hungarian algorithm, is O(N

)).

4 WRITINGSIMILARITYSEARCH

This section describes WRITINGSIMILARITY-

SEARCH (WSS), the software architecture that

implements the proposed approach and that is able to

automatically analyze and classify manuscript pages.

WSS is basically composed of two phases:

We note that our approach is completely independent

from speciﬁc features, distance functions, and matching al-

gorithms; this is due to the nice property of the Windsurf

framework to be parametric with respect to all such dimen-

sions.

Training Phase: the ground truth is built on a repre-

sentative subset of the documents collection (i.e., au-

thentic writings) by extracting features from elements

at different level of granularity; each image is also

labeled by means of the reference “author class” that,

in the more general case, is represented by a keyword.

The class can be also modeled at a ﬁner level of gran-

ularity; this is possible by associating to each image

a path of an “author taxonomy”, (i.e., “author/date”,

“author/word”, “author/word/date”, etc.).

Test Phase: at run time, given a suspicious im-

age element, its features are extracted and compared

to the training features in order to compute their

(dis)similarity scores; paternity of the test element

is established based on a k-NN classiﬁer, as we will

present in the following.

In doing this, WSS exploits both “automatic”

extracted features (i.e., writing shape) and “semi-

automatic” extracted paleography characteristics (i.e.,

leading, body, angle, density, words’ spacing, mar-

gins, etc.), and provides the user with several func-

tionalities, such as persistent (MySQL-based) fea-

tures extraction, features comparison, k-NN querying

and classiﬁcation, that are available through an intu-

itive and user-friendly GUI (see Figure 2).

Figure 2: WRITINGSIMILARITYSEARCH’sinterface.

WSS is based on the Windsurf software li-

brary (Bartolini et al., 2011), freely available at

URI http://www-db.disi.unibo.it/Windsurf/. Further,

it exploits the graphical functionalities offered by

Graphoskop to deal with traditional paleography fea-

tures in a semi-automatic way.

Windsurf provides a general framework for

region-based image retrieval that we have oppor-

tunely instantiated and extended for the speciﬁc con-

text of handwritten documents collections. We also

have included a software module that offers to the

user speciﬁc graphical tools to extract paleography

features.

Focusing on writing shape, given an image of

Similarity-basedImageRetrievalforRevealingForgeryofHandwrittenCorpora

107

a document element (for example, the Italian word

“punto” depicted in Figure 2), SIFT/SURF fea-

tures are automatically derived and persistently main-

tained.

The process can be recursively repeated for

whole folders of images by using the Add Folder but-

ton in the GUI instead of the Add Image one.

The basic ingredients of the Windsurf framework

are instantiated within our new context as follows:

image regions correspond to elements features (e.g.,

SURF salient points); the region (dis)similarity func-

tion d

is the Euclidean distance; the matching prob-

lem is solved by means of the best bin ﬁrst 1-1 match-

ing and using the average aggregation function for

computing (dis)similarity between images.

To complete the WSS description, we detail how

the k-NN classiﬁer works (Baeza-Yates and Ribeiro-

Neto, 1999). This is based on the notion of k-NN

similarity search, where, given a query (test) image

and a numeric value k, the k most similar images of

the training set with respect to the query are returned

in descending order of similarity (or, equivalently, the

k less dissimilar, in ascending order of dissimilarity)

(see Figure 3, for a concrete example).

Figure 3: k-NN similarity search example in WSS: 9-NN

results for the query “punto” (top-left image) depicted in

Figure 2.

Images in the above ﬁgure have been highlighted

with different colors to emphasize their relevance

with respect to the query image. The ﬁrst three im-

ages (colored in green) have been written by the same

author of the query, while other images (colored in

red) have been written by different authors. In this

case, therefore, we are able to automatically (and cor-

rectly) predict that the author of the query image is the

author of the green-colored images (the most similar

to the query).

We experimentally found that a good number of salient

points for representing the shape of a document element,

such as a word, ranges from 200 to 500 points.

Since the k-NN similarity query is performed on

images that are part of the ground truth, the classiﬁer

simply analyzes the paternity classes associated to the

k returned images, assigning to the (suspicious) test

image the most frequent class over the k results. Of

course, the choice of the “optimal” value of k is an

issue that requires to be experimentally investigated

for our k-NN classiﬁer in the proposed context.

We note that the same approach can also be ex-

ploited at a cluster level of training features, instead

that using individual elements, as described above. In

this case, a clustering phase is integrated within the

training process in order to derive clusters of elements

at ﬁxed levels of paternity classes (e.g., at level of

author/, author/word, author/date, author/date/word,

etc.). For each of so-derived cluster, a centroid is

computed as the average of the included elements fea-

tures. Since all images belonging to a single class

are now compacted into the corresponding centroid,

the k-NN search is now reduced to a 1-NN search,

and the predicted class is the one of the most similar

centroid. As a main consequence, here the selection

of the numeric value k is no longer an issue; further-

more, the comparison process becomes simpler than

before, since it now involves a lower number of ele-

ments (clusters) instead of the whole dataset of indi-

vidual training elements. On the other hand, it has to

be noted that the effectiveness of a cluster-based clas-

siﬁer is strongly inﬂuenced by the “quality” of cen-

troids in representing all cluster elements, thus in the

variance of the involved features, i.e., classes having

a high variance in elements’ features are expected to

be less effective than homogeneous classes.

In the following, we will refer to the two classi-

fying modalities as cluster-based and element-based

classiﬁers, respectively.

5 EXPERIMENTAL RESULTS

We include here the results of preliminary experi-

ments performed on real handwritten corpora consist-

ing of about 200 JPEG image documents (320x240

pixels in size). The reference corpora includes doc-

uments by a speciﬁc target author X and suspicious

documents written in a very similar handwriting style

to the target author, that we refer as X’.

In details, paleography features (i.e., leading,

body, angle, and density) were computed on the

whole documents using WSS provided graphical

tools, while writing shape features were automatically

extracted from speciﬁc word elements characterizing

the writing style of the target author X. In our speciﬁc

image context, we experimentally found similar re-

SIGMAP2015-InternationalConferenceonSignalProcessingandMultimediaApplications

108

sults in effectiveness using SIFT and SURF features.

Thus, for efﬁciency reason, we decided to base our

feature extraction process on the SURF algorithm. In

particular, about 200 salient points were extracted for

each word. Several tests demonstrated that 200 salient

points represented a good compromisebetween effec-

tiveness and efﬁciency for representing the content of

single words. Finally, in order to add noise to our

initial corpora, a large number of handwritten docu-

ments (representing both whole document pages and

same special words as above), written by different au-

thors (named A, B, and C), were added to the refer-

ence DB. A sample of representative query images,

not belonging to the training set and for which pa-

ternity classes (at different levels of granularity) were

associated for evaluation purposes, was randomly se-

lected to test our classiﬁer.

We start our investigation by trying to answer to

a preliminary important question: “Does the way we

obtain the digital representation of an original hand-

written page, e.g., scan of the original page, photo of

the original page, scan/photo of a photocopy of the

original page, etc., affect the target features represen-

tation?”

We experimentally found that both traditional pa-

leography and writing shape features are invariant to

the digital representation of the original handwritten

page. In fact, the absolute value of the differences be-

tween single feature values representing paleography

characteristics, when extracted from scan, photo and

scan/photo of the original page, were almost equal to

zero. The same behavior was conﬁrmed for shape

features: k-NN searches based on any of above dig-

ital representations of the original page (image query)

provided the same ranking of returned images.

We are now ready to demonstrate the effectiveness

of the WSS classiﬁer in predicting the paternity of

a suspicious test writing. In particular, we provide

the results of both WSS element-based and cluster-

based classiﬁers. In doing this, we use both traditional

paleography features and writing shape ones.

We start by using traditional paleography fea-

tures. In details, we clustered individual features so

as to train WSS for each author; then we applied the

cluster-based classiﬁer.

Table 1 shows the effectiveness of WSS in dis-

tinguish handwritings from different authors: training

author was in this case set to B, whereas the test image

(query) was written by author A.

To check paternity, absolute differences in indi-

vidual paleography features of test image and training

class are compared to experimentally derived thresh-

old values. In the proposed example, we postulate that

the test image (written by author A) belongs to train-

Table 1: Discriminative power of WSS in distinguishing

handwritings coming from different authors: training author

is B; test image is associated to author A by the ground truth.

Target Features Training (B?) Test (A) Abs. Diff.

Leading (mm) 5.01 5.00 0.01

Writing Body (mm) 4.45 2.90 1.55

Writing Ascender (mm) 1.50 1.30 0.20

Writing Descender (mm) 1.57 1.30 0.27

Writing Angle (

◦

) 89.40 76.00 13.40

Writing Ascender Angle (

◦

) 88.70 76.00 12.70

Writing Descender Angle (

◦

) 86.90 77.00 9.90

Space between Words (mm) 2.65 3.40 0.75

Density (%) 10.10 6.00 4.09

ing class B. However, the table highlights high differ-

ence values for the following features: writing body

(> 0.5 mm), all writing angles (> 5

◦

), space between

words (> 0.5 mm), and density (> 2%). Since the

number of sufﬁciently dissimilar features between the

test image and the training (average) image is high,

we conclude that the test image does not belong to

the test class B. In this case, WSS is thus able to infer

that the test image has not the same handwriting than

the training author.

If we apply the WSS classiﬁer to the whole clus-

ter dataset, predicted author for the test image is A,

since the most similar centroid of all trained authors

is the one corresponding to cluster A and the number

of differences in features exceeding the threshold is

low. Again, WSS is able to classify the given test

image to the correct class.

In Table 2, we report the results of a similar ex-

periment: this time, we trained WSS on a subset of

writings (i.e., from a speciﬁc decade) of the author X

and on a sample writings of the suspicious author X’

coming from the same period. Then, we set author X

as training and provide a test image (query) written

by author X’ to WSS. In this case, we want to investi-

gate if WSS is able to distinguish (possibly existing)

differences between the two handwritings corpora.

Table 2: Discriminative power of WSS in distinguishing

handwritings coming from similar authors: training author

is X; test image is associated to author X’ by the ground

truth.

Target Features Training (X?) Test (X’) Abs. Diff.

Leading (mm) 7.70 7.10 0.30

Writing Body (mm) 2.23 2.10 0.13

Writing Ascender (mm) 2.07 2.40 0.33

Writing Descender (mm) 2.36 2.60 0.23

Writing Angle (◦) 71.00 73.00 2.00

Writing Ascender Angle (◦) 75.42 74.00 1.42

Writing Descender Angle (◦) 80.26 77.00 3.26

Space between Words (mm) 2.92 2.60 0.32

Density (%) 8.83 10.00 1.17

Similarity-basedImageRetrievalforRevealingForgeryofHandwrittenCorpora

109

As we can observe from statistics shown in table,

differences here are very small. The classiﬁer would

thus conﬁrm the hypothesis (i.e., X’=X).

We now show the performance of WSS with re-

spect to writing shape features, showing how the sys-

tem is able to discriminate handwritings of different

authors by exploiting k-NN search. In particular, we

evaluate the effectiveness of k-NN search using the

well known precision (P) and recall (R) metrics. Pre-

cision measures the number of relevant (i.e., the same

word written by the same author) images to the query,

over the k returned images, while recall is the number

of relevant images to the query over the total number

of relevant samples in the training set.

Table 3 shows the average results of P and R ob-

tained on our test queries representing words. In this

case, the k-NN search is performed over the query

word written by different authors.

Table 3: k-NN average precision (P) and recall (R) values

vs. k.

k P R

1 1 0.1937

2 1 0.3875

4 0.9375 0.7437

6 0.783 0.8562

8 0.5625 0.8562

10 0.475 0.8562

15 0.3162 0.9062

20 0.2567 0.9375

Considering that classes in our dataset have (on

average) 5.5 relevant samples in the training set, we

observe how the quality of the results is quite good:

WSS is able to return most of the relevant images in

the ﬁrst positions; further, almost all relevant samples

are localized by the system very soon (we get values

of recall close to 1 in the ﬁrst 15 − 20 results). This

demonstrates the effectiveness of both local features

and comparison method adopted by WSS. Similar re-

sults were conﬁrmed when searching for the author

of any word. Even when a query word is not included

in the training set, WSS is therefore able to correctly

predict its author, thus obtaining good performance at

different levels of granularity.

Next, we test the quality of WSS classiﬁer. For

evaluating the quality (Q) of the classiﬁer, we as-

signed the binary value 1, when the class prediction

for a test image was correct with respect to the ground

truth, and 0 otherwise. With respect to the choice of

the value k, we experimentally found that a reasonable

solution is to bound k to the average value of the total

number of relevant images with respect to the query

in the training set.

For the element-based approach, our experiments

produce 100% accuracy for any value of k in the range

1 − 8. This is expected because of the very good

performance of the k-NN search demonstrated earlier

(see P and R values in Table 3).

Table 4 shows the average value of Q for the

cluster-based working modality (Figure 4).

Table 4: Average quality results of WSS cluster-based clas-

siﬁer: Q vs. clustering level.

Clustering Level Q (Cluster-based)

author 0.2

author/word 1

We observe how the quality decreases to 20%

when working at the “author” level; as we argued

in Section 4, this is due to the high variance in el-

ements’ features of author classes. In fact, as soon

as the classiﬁer is set to work at the “author/word”

level, Q reaches optimal values, as for the case of the

element-based classiﬁer.

To complete our analysis, we report two visual ex-

amples of the WSS classiﬁer at work, with the aim to

show how WSS could be helpful in predicting the pa-

ternity of a suspicious test image by author X’ (see

Figures 4 and 5).

Figure 4: Visual example of k-NN classiﬁcation based on

elements (k = 4).

In details, Figure 4 shows the prediction according

to the most frequent class among the results of a 4-NN

individual elements query (i.e., element-based classi-

ﬁer). Since the four ranked results are all of distinct

authors (C, X, A, and B, respectively), the classiﬁer

would select for the test image the class author asso-

ciated to the most similar image (i.e., C). Thus, the

classiﬁer would have refused the thesis X’=X.

In Figure 5 the classiﬁcation is based on the class

of a 1-NN cluster query; note that, further results in-

cluded in Figure 5 (with k = 4) show that WSS con-

siders, in this case, the handwriting from X’ (query

image) very far from the handwriting from author X

(which is ranked only at position 4). Even in this case,

the classiﬁer would have refused the thesis X’=X.

SIGMAP2015-InternationalConferenceonSignalProcessingandMultimediaApplications

110

Figure 5: Visual example of k-NN classiﬁcation based on

clusters (at author/word level).

Summarizing, provided results conﬁrm the effec-

tiveness of traditional manual practices of human ex-

perts in the context of paleography (i.e., manual ex-

traction and human observation of speciﬁc corpora

characteristics), practices here permitted by WSS in

a semi-automatic way and with the ﬁnal aim to clas-

sify suspicious writing to the proper class. Further,

provided tests demonstrate how the same ability to

discriminate between different handwritings can be

obtained in a completely automatic way by exploit-

ing writing shape features, meaning that the WSS

classiﬁer could be a very convenient and smart refer-

ence software architecture for paleography experts for

tackling the authorship attribution problem. In par-

ticular, we envisage two possible uses of the WSS

classiﬁer: (1) The two classiﬁers (the one using pa-

leographic features and the one using writing shape

features) could be used separately to predict a class

for a same query image; if the two predicted classes

are different, then we alert the handwriting expert that

the query image is suspect and that further investiga-

tion is needed. (2) Since the classiﬁer using paleo-

graphic features requires additional parameters (i.e.,

the (dis)similarity thresholds) for which an appropri-

ate value can be hard to be derived, we can use the

classiﬁer using writing shape features (which is com-

pletely automatic) to predict the class of the query

image; then, the handwriting expert can restrict her

search only on the author (or word) that has been pre-

dicted by WSS.

6 CONCLUSIONS

In this paper, we proposed a novel approach to the au-

thorship attribution problem based on image similar-

ity search. In details, we presented WSS, a software

architecture able to automatically predict the pater-

nity of a (suspicious) test document exploiting both

automatic and semi-automatic features. Preliminary

experimental results conducted on real data demon-

strated the effectiveness of our classiﬁer and encour-

age further investigations on this direction.

In the future, we plan to study and compare alter-

native representations for writing shapes, e.g., based

on global features, like wavelet and Fourier coefﬁ-

cients. Finally, we advocate the creation of bench-

marks consisting of large handwriting corpora allow-

ing the comparison of existing approaches. In doing

this, our intention is clearly to involve human experts

in the validation process.

ACKNOWLEDGEMENTS

Part of this research has been supported by the FARB

Project “Authorship, variants, style: textual analysis

between philology, linguistics, mathematics and com-

puter science”. A special thank to the students that

participated in the WSS software development.

REFERENCES

Aiolli, F. and Ciula, A. (2009). A case study on the system

for paleographic inspections (SPI): Challenges and

new developments. In Proc. Conf. Comp. Intell. and

Bioeng., pp. 53–66, Amsterdam, The Netherlands.

Baeza-Yates, R. A. and Ribeiro-Neto, B. (1999). Modern

Information Retrieval. Addison-Wesley.

Bartolini, I., Patella, M., and Stromei, G. (2011). The

Windsurf library for the efﬁcient retrieval of multime-

dia hierarchical data. In SIGMAP 2011, pp. 139–148,

Seville, Spain.

Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008).

Speeded-up robust features (SURF). Comput. Vis. Im-

age Underst., 110(3):346–359.

Bunke, H. and Wang, P. S. P., editors (1997). Handbook of

character recognition and document image analysis.

World Scientiﬁc.

Cao, H., Govindaraju, V., and Bhardwaj, A. (2011). Un-

constrained handwritten document retrieval. IJDAR,

14(2):145–157.

Ciaccia, P., Patella, M., and Zezula, P. (1997). M-tree: An

efﬁcient access method for similarity search in metric

spaces. In VLDB ’97, pp. 426–435, Athens, Greece.

David, D. and Karl, T. (2014). Handbook of Document Im-

age Processing and Recognition. Springer.

Lowe, D. G. (1999). Object recognition from local scale-

invariant features. In ICCV ’99, pp. 1150–1157.

Rath, T. M., Manmatha, R., and Lavrenko, V. (2004). A

search engine for historical manuscript images. In SI-

GIR 2004, pp. 369–376, Shefﬁeld, UK.

Rui, Y., Huang, T. S., Ortega, M., and Mehrotra, S. (1998).

Relevance feedback: A power tool for interactive

content-based image retrieval. Trans. on Circuits and

Systems for Video Technology, 8(5):644–655.

Similarity-basedImageRetrievalforRevealingForgeryofHandwrittenCorpora

111

Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A.,

and Jain, R. (2000). Content-based image retrieval at

the end of the early years. TPAMI, 22(12):1349–1380.

Tomasi, F., Bartolini, I., Condello, F., Esposti, M. D.,

Garulli, V., and Viale, M. (2013). Towards a taxonomy

of suspected forgery in authorship attribution ﬁeld: A

case: Montale’s diario postumo. In DH-CASE ’13, pp.

10:1–10:8, Florence, Italy.

SIGMAP2015-InternationalConferenceonSignalProcessingandMultimediaApplications

112