Recovering and Visualizing Deformation

in 3D Aegean Sealings

Bartosz Bogacz

, Nikolas Papadimitriou

, Diamantis Panagiotopoulos

and Hubert Mara

Forensic Computational Geometry Laboratory, Heidelberg Univeristy, Germany

Corpus der minoischen und mykenischen Siegel, Heidelberg University, Germany

npapad2007@gmail.com, diamantis.panagiotopoulos@zaw.uni-heidelberg.de

Keywords:

Computational Reconstruction, Feature Extraction, Pattern Recognition, 3D Cultural Heritage.

Abstract:

Archaeological research into Aegean sealings and sigils reveals valuable insights into the Aegean socio-

political organization and administration. An important question arising is the determination of authorship

and origin of seals. The similarity of sealings is a key factor as it can indicate different seals with the same

depiction or the same seal imprinted by different persons. Analyses of authorship and workmanship require

the comparison of shared patterns and detection of differences between these artifacts. These are typically

performed qualitatively by manually discovering and observing shared visual traits. In our work, we quantify

and highlight visual differences, by exposing and directly matching shared features. Further, we visualize and

measure the deformation of shape necessary to match sigils. The sealings used in our dataset are 3D struc-

tured light scans of plasticine and latex molds of originals. We compute four different feature descriptors on

the projected surfaces and its curvature. Then, these features are matched with a rigid RANSAC estimation

before a non-rigid thin-plate spline (TPS) matching is performed to ﬁne-tune the deformation. We evaluate

our approach by synthesizing artiﬁcial deformations on real world data and measuring the distance to the

re-constructed deformation.

1 INTRODUCTION

A seal is a small portable artifact mostly made of

stone but also found in other materials, such as

bone/ivory, metal, and various artiﬁcial pastes. It

displays engraved motifs and is generally perforated

so that it can be suspended e.g. on a necklace.

Seals have played an important role in Aegean so-

ciety and, as functional objects, they have served

three main purposes: securing, marking, and autho-

rizing. Their study provides important insight into

the Aegean socio-political organization and adminis-

tration

One important question arising in this study is the

determination of authorship and origin of seals. Given

two or more visually similar but not equal seals, pos-

sibly of different provenance, do they originate from

the same stamp or the same author? Research on seal-

ing artisan craftwork typically is concerned with qual-

itative judgments based on manually discovered and

Further introductory information can be found

at https://www.uni-heidelberg.de/fakultaeten/philosophie/

zaw/cms/seals/sealsAbout.html

observed traits. Our quantiﬁable results and visual-

izations can serve as foundations for evidence-based

reasoning. Thus, we address the challenge of ana-

lyzing patterns and deformations common to a pair

sealings.

The contributions of our work are (i) a pre-

processing and feature extraction pipeline for analyz-

ing the similarity and unique features of 3D scans of

sealings, (ii) the evaluation of this pipeline with syn-

thetic data for estimating deformations between seal-

ings, and (iii) the visualization of the deformation and

differences between sealings for Archaeological ap-

plications.

This paper is structured as follows: In Section 2

we present related work on the challenge to gen-

erate correspondences between sets of image pairs

of related but visually differing content. In Sec-

tion 3 we introduce our dataset and the necessary pre-

processing of the 3D data for further image process-

ing. Then, in Section 4 we describe four feature ex-

traction approaches to generate and weight correspon-

dence hypotheses. In Section 5, based on the pro-

posed correspondences, ﬁrst a rigid mapping is esti-

Bogacz, B., Papadimitriou, N., Panagiotopoulos, D. and Mara, H.

Recovering and Visualizing Deformation in 3D Aegean Sealings.

DOI: 10.5220/0007385804570466

In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 457-466

ISBN: 978-989-758-354-4

457

Figure 1: Three stages of preprocessing are shown from left to right. (i) The original scanned 3D data with texture and

light. (ii) The surface curvature of the seal 3D data rendered into a raster image. (iii) The seal image after local histogram

normalization and padding.

mated and hypotheses are ﬁltered to match the esti-

mated model. On the remaining hypotheses a non-

rigid model is ﬁne-tuned. In Section 6, we visualize

and describe the application of our results in an Ar-

chaeological context. A summary and conclusion is

given in Section 7.

2 RELATED WORK

Finding image correspondences and image registra-

tion is a challenging task usually approached by

research in medical settings (Holden, 2008), Geo-

sciences (Mongus and

Zalik, 2012; Chen and Li,

2012; Tennakoon et al., 2013), and stereo match-

ing (Tola et al., 2010). In these cases, the images be-

ing compared depict the same scene and vary only in

small detail or pose. The large-scale content and iden-

tifying features are common to both images. How-

ever, in our setting, both sealing images differ in the

depicted content and correspondence metrics must

consider high-level semantic similarity.

In (Ham et al., 2016) an approach is presented

for correspondence matching and image deformation

based on a set of region proposals called Proposal

Flow. The authors proceed in two steps to match two

images. First, a set of multi-scale region proposals

are generated and matched. The matching consid-

ers the the spatial support of the region pair, i.e. its

geometric closeness, and its visual similarity, com-

puted with Spatial-Pyramid Matching (SPM) (Lazeb-

nik et al., 2006) or a convolutional neural network

(CNN) (Krizhevsky et al., 2012). The spatial support

is computed and weighted with neighboring matches

in mind, i.e. support is low if neighbors disagree. This

scheme enforces local smoothness of matches. Sec-

ondly, from the set of matches, the authors compute

a dense ﬂow ﬁeld for each pixel partaking in a cor-

respondence. The source image is then transformed

according to the ﬂow ﬁeld to match the target image.

In (Rocco et al., 2017) the authors enhance the

classical computer vision correspondence pipeline of

extracting descriptors and estimating a transform by

making it fully differentiable and end-to-end train-

able. The feature extraction is handled by cropping

the classiﬁcation layer from a pre-trained CNN. Then,

the feature maps of two images processed by these

networks are correlated with a scalar product and

passed to regression CNN that estimates rigid trans-

formation parameters. This approach is repeated with

a second regression CNN that estimates non-rigid pa-

rameters for a thin-plate spline (TPS) based trans-

form. The main advantage of this approach is the abil-

ity for end-to-end training of the CNNs and the sin-

gle step estimation of the rigid and non-rigid transfor-

mation parameters, i.e. no optimization is performed

since the neural network predicts the parameters in

one forward pass.

A typical approach for robust estimation is

RANSAC. However, due to the adaptable complex-

ity of TPS models a minimal subset of points cannot

be deﬁned, i.e. a minimal deformation can be found

for any pair of point-sets. However, in (Tran et al.,

2012) the authors show that one can reject outliers

reliably by ﬁtting a hyperplane in the feature space

of correspondences. A separation exists between the

distribution of inliers and the distribution of outliers.

In our approach, due to the nature of our data,

i.e. historical artifacts, no training data is available.

While following the classical approach of extracting

descriptors, computing correspondences and estimat-

ing a transformation we also adopt the two-stage ap-

proach of Rocco et al. by ﬁrst estimating a rigid trans-

formation and then ﬁne-tuning with TPS.

3 DATASET

The CMS project (Corpus der minoischen und

mykenischen Siegel), with its archive of physical

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

458

sigils and sealings currently residing in Heidelberg,

Germany, systematically documents and publishes all

known ancient Aegean seals and sealings. The CMS

consists of an physical archive, published volumes,

and an open-access digital database

. The database

contains photos and tracings of seals and sealings

with associated manually added meta-data in line with

current Aegean glyptic research.

Within an interdisciplinary research project we are

gathering hundreds of sealings of a collection hold-

ing approx. 12.000. We make use of the physical

archive and the manufactured impressions of sealings

in plasticine, silicon, and gypsum. The impressions

of the sealings are their negative imprint. That way,

the motif and detail can be discerned more clearly.

The surface of the plasticine negatives, the sealing im-

pressions, are acquired with 800dpi resolution using a

structured-light 3D scanner. In this work we use the

word sealings and sigils interchangeably as our ap-

proach is applicable to any artifacts of similar shape.

3.1 Pre-processing

The resulting 3D data is processed in GigaMesh

computing the surface curvature of the impressions.

The computation proceeds using either Multi-Scale

Integral Invariants (MSII) (Mara and Kr

omker, 2017;

Mara, 2016) or by Ambient Occlusion (AO) (Miller,

1994). While MSII provides better details for small-

scale surface features, for aligning sealing impression

we use AO which provides smoother and medium-

scale surface curvature.

The surface data augmented with surface curva-

ture is projected into a raster image of 400 ×600 pix-

els extents. We then apply local pixel histogram scal-

ing with a disk of 50 pixels, thus further emphasizing

medium-scale surface curvature. Finally, the images

are padded to provide the necessary space for subse-

quent deformation operations. The process from a 3D

sealing with texture data to a pre-processed sealing

raster image is shown in Figure 1.

4 CORRESPONDENCE

The underlaying assumption of our work is that

pairs of images under study, in our case impressions

of sealings, share visually similar and semantically

equivalent regions. If we then identify and align these

region pairs, the deformation that has been applied to

https://www.uni-heidelberg.de/fakultaeten/

philosophie/zaw/cms/databases/databasesfull.html

https://gigamesh.eu

one of the sealings will become apparent. This iden-

tiﬁcation requires a discriminative distance metric be-

tween image patches. We evaluate four different dis-

tance metrics of increasing complexity.

In this work we deﬁne the left sealing image as the

sealing being deformed to match the right sealing im-

age. For reasons of brevity, in the following sections,

we only give deﬁnitions of the feature descriptors for

left sealing image

{D,Y,Z,V,N} f

and omit deﬁnitions for

the features of the right sealing image

{D,Y,Z,V,N}g

they are identical up to interchanged variables.

4.1 Direct Matching

In the baseline approach sub-regions are compared di-

rectly by pixel value using the Euclidean distance. For

the pair of sealing images under study I, J we gener-

ate two sets A, B of key-points a

, b

∈ R

arranged in

a grid with 60 × 60 grid points overlaid on the seal-

ing images. At each key-point we extract sub-regions

of the sealing images, the patches p

, q

∈ R

α×β

with

height α and width β, centered around the key-points.

By ﬂattening these patches, we compute respective

feature vectors

∈ R

αβ

. Then the distance

between the image patches

i j

is given by the Eu-

clidean distance between the image patches.

i j

= k

−

k (1)

Since our image data depicts local surface curva-

ture, the direct comparison of pixel values represents

a direct comparison of curvature values of the original

surfaces.

4.2 DAISY Descriptor

The DAISY image descriptor (Tola et al., 2010)

is a reformulation of the SIFT (Lowe, 2004) and

GLOH (Mikolajczyk and Schmid, 2005) descriptors

efﬁciently computable for each pixel in an image.

This is achieved by computing multi-scale histograms

of oriented gradients only once per image region shar-

ing these among neighboring pixels. Similar to the

direct comparison image patches, we extract DAISY

descriptors

∈ R

δγ

with δ orientations and γ

rings, centered at the key-points a

, b

i j

= k

−

k (2)

The distance between the feature descriptors is

computed using the Euclidean distance.

4.3 BOVW Descriptor

We aggregate locally bounded sets of DAISY fea-

tures into feature vectors to better capture and de-

Recovering and Visualizing Deformation in 3D Aegean Sealings

459

scribe repeating higher-order visual patterns. Such an

approach has been ﬁrst introduced as Bag-of-Visual-

Words (BOVW) in (Fei-Fei and Perona, 2005).

We denote the union of both sets of DAISY de-

scriptors

and

as E = {

} ∪ {

}. These

descriptors are then quantized into the sets

and

. The quantization dictionary is computed with k-

means (MacQueen, 1967) which minimizes the fol-

lowing term to ﬁnd a set of visual words v

∈ V . The

size of the dictionary, i.e. the size of the set |V | is

chosen as κ.

argmin

∑

v∈V

∑

e∈E

ke − vk (3)

Then, for each DAISY descriptor

and

ﬁnd the closest visual work v

and record them in the

scalars

∈ {1. . . κ}.

= argmin

− v

k (4)

The radius of such a patch of visual words cen-

tered at a key-point is given by θ.







= 1 ∧ ka

− ak < θ | ∀a ∈ A}|

= κ ∧ ka

− ak < θ | ∀a ∈ A}|







(5)

The BOVW feature vectors

∈ N

count

the occurrences of visual words

close to a

key-point a

, b

i j

= k

−

k (6)

Comparison is performed using the Euclidean dis-

tance between the counts of visual words.

4.4 DenseNet Descriptor

To capture abstract visual concepts and meaningful

objects in the sealing images, e.g. heads, arms, or

legs, we employ CNNs. A CNN captures increas-

ingly abstract concepts and patterns that are used to

describe and classify an image.

An approach to use CNNs as feature extractors

without a labeled dataset for the target domain is in-

troduced in (Razavian et al., 2014). By pre-training

the CNN with a common large natural image dataset

with ground-truth labels, common visual patterns are

learned. The authors make use of this high-level pat-

tern description as means for embedding images into

a feature-space by removing the last fully-connected

classiﬁcation layer.

Given all but the last fully-connected layer of

a network, which we denote as the function W :

H×W

→ R

1000

. The dimensionality of the target do-

main R

1000

is the count of classes used in the training

dataset. The features

∈ R

1000

of the image patch

, q

are computed as follows:

= W (p

) (7)

Then, the features vectors of two image patches

are compared using the Euclidean distance.

i j

= k

−

k (8)

The network W uses the DenseNet architecture

(Huang et al., 2017) trained on the ImageNet (Deng

et al., 2009) dataset.

4.5 Evaluation

For our purposes the suitability of a image descriptor

is given by two properties: (i) the correctness of se-

mantic equivalence, e.g. a point on the horse head on

the left sealing is assigned to a point on the horse head

on the right sealing, and (ii) by the uniqueness of the

assignment, i.e. a point of the left image is assigned

to dense unimodal concentration on the right.

We visually evaluate the performance of the four

presented image patch comparison methods by sam-

pling points-of-interest from the left sealing and visu-

alizing the descriptor distances on the right sealing as

shown in Figure 2.

We experimented with different values for image

patch sizes, DAISY parameters and counts of visual

words. The following values were chosen based on

the quantitative evaluation performed in Section 5.3.

We used a image patch size of α = 128, β = 128

pixels for all descriptors but the DenseNet based ap-

proach, which used the architecture native patch size

of 224× 224 pixels. For both of the DAISY based de-

scriptors we used δ = 8 orientations and γ = 4 rings

with a radius of 100 pixels for the pure DAISY de-

scriptor and 30 pixels for BOVW descriptor. Finally,

we used clustered the DAISY descriptors into κ = 512

distinct visual words for the BOVW description.

Based on Figure 2, we see that the direct patch

comparison (a) performs well when large-scale fea-

tures are compared, i.e. the back of a horse, but fails at

uniquely identifying features with ﬁne detail, such as

the head of a horse. Since pixel values are compared

directly, even small misalignments cause signiﬁcant

differences between two image patches. The DAISY

feature descriptor (b) and the BOVW feature descrip-

tor (c), are robust against small offsets and perform

signiﬁcantly better with small and large feature struc-

tures. Finally, the CNN (d) feature descriptor does

neither capture semantics of the image well nor is the

assignment concentrated in one place. We assume

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

460

Figure 2: Point query responses of the four presented de-

scriptors. Given a point on a left sealing image, points on

the right sealing image are shown that are closest in feature

space. Brighter colors indicate less distance, i.e. more simi-

larity. From top to bottom: Original image with query point

in red, (i) Euclidean distance of patches, (ii) DAISY feature

descriptor, (iii) Bag-of-Visual-Words, and (iv) DenseNet

pre-trained on ImageNet.

that our dataset, patches of a ﬁltered and normalized

surface curvature of a mesh, are signiﬁcantly outside

the distribution of the training dataset and the learned

visual patterns are therefore not transferable.

5 ALIGNMENT

After computing the visual distance between image

patches, or circular regions in the case of BOVW,

at grid key-points a

and b

, we have to ﬁnd unique

correspondences. We compute the matches using a

greedy approach by determining for each left feature

vector f

the visually closest right feature vector g

and vice versa. If either disagrees, such a match is

discarded. Then, assuming a generic distance metric

of features descriptors d

i j

, the set of matches is de-

noted by m

i j

∈ {0, 1} and is computed as follows.

i j







1 if argmin

i j

= argmin

i j

0 otherwise

(9)

For each match in m

i j

the left visual descriptor is

closest to its right counterpart and vice versa.

5.1 Rigid Pre-alignment

We align the sealing images in two stages. First, a

rigid alignment model is estimated and only corre-

spondences matching that model within a small error

ε are kept. In a second stage, a non-rigid ﬁne-tuning

ﬁts the remaining correspondences to minimize the

previously allowed error.

We estimate the rigid transformation model using

the RANSAC algorithm (Fischler and Bolles, 1980).

Let R ∈ R

2×2

denote a rotation matrix in Euclidean

space and let v ∈ R

denote a translation. Then, the

RANSAC algorithm minimizes the squared error be-

tween the correspondences m

i j

of the key-point sets

and b

min

(R,t)

∑

i j

k(Ra

+ v) − b

k (10)

Given the rigid transformation model (R, t) esti-

mated with RANSAC, we discard correspondences

i j

that do not ﬁt the estimated model outside the

aforementioned threshold ε.

¯m

i j

(

i j

if k(Ra

+ v) − b

k < ε

0 otherwise

(11)

The remaining correspondences ¯m

i j

are used for

the subsequent non-rigid ﬁne-tuning. With ¯a

∈

A and

∈

B we denote grid points which are only part of

the set of remaining correspondeces, a subset

A ⊂ A

and

B ⊂ B of the original points.

5.2 Non-rigid Fine-tuning

The non-rigid transformation is based on a direct es-

timation of a thin-plate spline model closely follow-

ing the approach presented in (Chui and Rangarajan,

2003). However, since we already computed corre-

spondences ¯m

i j

, we skip the expectation maximiza-

tion procedure and skip outlier detection to directly

estimate the model.

Recovering and Visualizing Deformation in 3D Aegean Sealings

461

Figure 3: Predicted deformation visualized with a colored

grid on the deformed left sealing image. Hotter colors on

the deformation grid indicate a higher amount of compres-

sion and cooler colors indicate a higher amount of stretch-

ing. Then, deformed left and original right sealing images

overlaid with 50% opacity.

The TPS model minimizes the following equa-

tion to ﬁnd a transfer function t

d,w

: R

→ R

to map

matching source key-points a

to target key-points b

The operator L follows the deﬁnition in the work of

Chui et al., it is a double integral of the square of the

second order derivatives of the mapping function, and

is used as regularization. The parameter λ controls the

strengths of the regularization and enforces the cho-

sen smoothness.

min

d,w

∑

k ¯a

− s

d,w

(

)k + λkLt

d,w

k (12)

We follow the procedure outlined in Chui et

al. work to minimize the term an compute the param-

eters d and w. The parameter d ∈ R

3×3

denotes a

rigid transform matrix in homogeneous coordinates,

the parameter w ∈ R

M×3

denotes the non-rigid spline

transformation parameters. The TPS kernel Φ con-

tains information of the spatial relationships of the

control points, the left grid points of the remaining

correspondences.

Φ = k ¯a

− ¯a

logk ¯a

− ¯a

k (13)

The mapping function for some homogenous

point x ∈ R

is written as follows:

d,w

(x) = x ∗ d + Φ ∗ w (14)

Given the computed parameters d and w, we now

can transform pixel positions of the left sealing im-

age to semantically meaningful positions in the right

sealing image.

Figure 4: Mean distance between the vertices of the syn-

thesized deformation grid and the respective vertices of the

predicted deformation grid.

Figure 5: Mean pixel-wise difference between synthesized

deformed image, i.e. the target image in the evaluation test,

and the estimated deformation applied to the source image.

5.3 Evaluation

The nature of the dataset and research question posed

in this work does not admit to a quantiﬁable ground-

truth of expected alignments. Nevertheless, to be able

to draw conclusions from our research, we evaluate

the precision of our deformation estimation approach

by synthesizing artiﬁcial deformations.

Deformation synthesis is based on the same TPS

modeling as our estimation procedure. We manually

create a grid of 12 points on the left sealing image

which are randomly perturbed. We chose a low count

of points, since natural deformation of sealings, e.g.

due to heat or mis-handling, are large-scale.

Let c

denote a manually chosen grid point. Then

˜c

is the respective point perturbed by a uniform dis-

tribution in [−50, 50]. Using these sets of points, we

estimate a synthetic transform (d, w). The pixel posi-

tions of the right sealing image are computed by sam-

pling the right sealing image on the position indicated

by the transfer function.

We evaluate our approach by comparing the align-

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

462

Figure 6: Left: Synthesized expected deformation grid.

Right: Estimated deformation grid based on the presented

approach using Bag-of-Visual-Words (BOVW) feature ex-

traction. Hotter colors indicate compression, cooler colors

indicate stretching.

ment predicted by our approach to the synthesized

ground-truth alignment. We set the non-rigid defor-

mation regularization to λ = 1000 and the allowed

rigid model error to ε = 30. Other values, such as

λ ∈ [1, 100, 10000] and ε ∈ [10, 100] resulted in worse

performance. Given the initial grid of key-points a

the left sealing and the predicted transformation pa-

rameters (

d, ˆw), we compute the deformed grid ˆa

ˆa

= t

d, ˆw

) (15)

Then, the error

grid

E to the expected transforma-

tion ˜a, the artiﬁcially synthesized transformation, is

computed by the sum of euclidean distances between

the respective grid points.

grid

E =

|A|

∑

k ˆa

− ˜a

k (16)

We compare the prediction performance of the

presented feature extractors and alignment stages,

rigid and non-rigid, by comparing their prediction er-

ror to the expected transformation. A lower prediction

error indicates a better prediction performance.

Additionally, we consider the sum of pixel level

differences

pixel

E between the deformed left sealing

image I and the original right sealing image J. The

set X , with |X| the count of elements in the set, gives

the pixel positions common to both images, i.e. when

both images are overlaid on top of each other as

shown in Figure 3.

pixel

E =

|X|

∑

x∈X

|I(t

d, ˆw

(x)) − J(x)| (17)

We consider the error to be predictive of alignment

performance. The pixel values of the sealing images

denote the local curvature of the original 3D scanned

surface, they do not represent texture and lighting.

Therefore, no undue biases are introduced by such

a comparison. Figure 4 and Figure 5 compares the

prediction performances over 10 repeated synthesized

deformations of the presented descriptors.

Figure 7: Vector ﬁeld plot of the differences between the

synthesized expected deformation grid and the estimated

deformation grid. Red arrows are positioned on estimated

grid points and point toward expected grid points. Most of

the prediction error is concentrated outside of the sealing

image, where no correspondences are present for a defor-

mation estimation.

5.4 Visualization

In addition to the evaluation metrics, we visualize the

computed transformations for visual inspection. In

particular, we are interested in the non-rigid deforma-

tion of the transformation. We transform a regular

grid of points with the predicted (d, w) and colorize

the grid points by the difference of distances e

to their

neighbors, weighted by the Gaussian distribution, be-

fore and after transformation.

∑

(k ˆa

− ˆa

k − ka

− a

k)e

−ka

−a

(18)

The value σ controls the fuzziness of the coloring

of the deformations and is set to σ = 0.01. The result-

ing visualization is shown in Figure 6. We also visu-

alize the difference between the predicted grid points

and the expected grid points using a quiver plot. At

each grid point of the predicted grid an arrow points

in the direction of the respective expected grid point,

as shown in Figure 7.

6 RESULTS

Our work provides two directly applicable results for

historical research: (i) the grid visualizations of de-

formation, as shown in Figure 3, necessary to align

two seal images in either direction, (ii) overlays of de-

formed and target seal images, as shown in Figure 8

and in Figure 9, highlighting concrete differences.

Recovering and Visualizing Deformation in 3D Aegean Sealings

463

Figure 8: Visualization of pixel-wise differences for both

deformation directions. For both conﬁgurations of left seal-

ing and right sealing, the deformed left sealing image has

been overlaid on top of the right original sealing image.

Brighter colors indicate higher pixel-wise differences.

Our overlay visualizations provide a smooth and

natural, i.e. differentiable and regularized deforma-

tion, embedding of a deformed seal image over the

target image. Medium-scale differences can be easily

spotted and further investigated. The overlay provides

a view uncluttered from non-informative small differ-

ences due to scanning noise and damage.

If the characteristic inﬂuences of damage or man-

ufacturing handicraft to the shape of the seals and the

resulting deformations are known, e.g. by examples

or by real-world experiments, the deformations esti-

mated in our approach serves as quantitative evidence

substantiating expert hypotheses.

7 SUMMARY & OUTLOOK

Our work provides an approach for the need of quan-

tiﬁable research results and evidence gathering in the

analysis of authorship and origin of Aegean seals and

sealings. In particular, we are interested in com-

monalities and differences in shape and expressions

of visual features. However, due to different man-

ufacturing technique or damage to the material, the

artifacts may be deformed. We approach this chal-

lenge by 3D scanning the seals and sealings and pre-

processing the data to arrive at a high-contrast 2D

images of the surface curvature. Then, we evalu-

ate four visual feature descriptors, direct compari-

son of image patches, Bag-of-Visual-Words (BoVW),

DAISY descriptors and pre-trained DenseNet, with

respect to their ability to ﬁnd semantically meaningful

correspondences. Given the set of correspondences,

we ﬁrst estimate a rigid transform with RANSAC

and proceed with ﬁne-tuning residual alignment error

with Thin-plate splines (TPS). We evaluate the accu-

racy of the alignment by generating synthetic defor-

mations of our dataset and comparing the expected

deformation to the estimated deformation. Thus, we

avoid the need for ground-truth that is often not avail-

able for historical artifacts. Finally, we visualize our

results with a grid of the estimated deformation and

colored by its curvature and an overlay highlighting

differences of aligned images.

In future work, we are interested in physically

manufacturing ground-truth of characteristic defor-

mations by creating copies of chosen seals and seal-

ings and then intentionally damaging and deforming

those. This would allow us to analyze and then au-

tomatically classify the type of damage inﬂicted, by

estimating the induced deformations and comparing

these to the manufactured ones. Further, the presented

pipeline is amendable to a fully 3D setting, where pro-

jecting into 2D raster images is unnecessary, by us-

ing 3D surface feature descriptors (Bogacz and Mara,

2018; Fey et al., 2018).

ACKNOWLEDGMENTS

We genuinely thank and greatly appreciate the efforts

of Maria Anastasiadou supervising the “Corpus der

minoischen und mykenischen Siegel” (CMS). We sin-

cerely thank Markus K

uhn for his contributions to our

tooling. Katharina Anders for her feedback on related

work, and ZuK 5.4 and BMBF eHeritage II for par-

tially funding this work.

REFERENCES

Bogacz, B. and Mara, H. (2018). Feature Descriptors for

Spotting 3D Characters on Triangular Meshes. In-

ternational Conference on Frontiers in Handwriting

Recognition.

Chen, C. and Li, Y. (2012). A robust method of thin plate

spline and its application to DEM construction. Com-

puters & Geosciences.

Chui, H. and Rangarajan, A. (2003). A new point matching

algorithm for non-rigid registration. Computer Vision

and Image Understanding.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-

Fei, L. (2009). ImageNet: A Large-Scale Hierarchical

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

464

Figure 9: Pairwise comparison of four supposedly identical sealings in our dataset. Comparison is not symmetric as in one

case the left sealing is being deformed and in the other case the top sealing is being deformed. Shown are additive sealing

overlays where the left sealing is deformed to match the top sealing. Subject pose is therefore constant in columns and varying

in rows. The feature descriptor used is DAISY. Mean of pixel-wise differences, with each pixel in [0, 255], is given on the

bottom left of the respective overlaid sealing pair.

Image Database. Computer Vision and Pattern Recog-

nition.

Fei-Fei, L. and Perona, P. (2005). A Bayesian Hierarchical

Model for Learning Natural Scene Categories. Com-

puter Vision and Pattern Recognition.

Fey, M., Lenssen, J. E., Weichert, F., and M

uller, H. (2018).

SplineCNN: Fast Geometric Deep Learning with Con-

tinous B-Spline Kernels. Computer Vision and Pattern

Recognition.

Fischler, M. A. and Bolles, R. C. (1980). Random Sample

Consensus: A Paradigm for Model Fitting with Ap-

plications to Image Analysis and Automated Cartog-

raphy. SRI International.

Ham, B., Cho, M., Schmid, C., and Ponce, J. (2016). Pro-

posal Flow. Computer Vision and Pattern Recogni-

tion.

Holden, M. (2008). A Review of Geometric Transforma-

tions for Nonrigid Body Registration. Transactions

on Medical Imaging.

Huang, G., Liu, Z., van der Maaten, L., and Weinberger,

K. Q. (2017). Densely Connected Convolutional Net-

works. Computer Vision and Pattern Recognition.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-

ageNet Classiﬁcation with Deep Convolutional Neu-

ral Networks. Neural Information Processing Sys-

tems.

Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond

Bags of Features: Spatial Pyramid Matching for Rec-

ognizing Natural Scene Categories. Computer Society

Conference on Computer Vision and Pattern Recogni-

tion.

Lowe, D. (2004). Distinctive Image Features from Scale

Invariant Keypoints. Computer Vision.

MacQueen, J. B. (1967). Some Methods for Classiﬁcation

and Analysis of Multivariate Observations. Berkeley

Symposium on Mathematical Statistics and Probabil-

ity.

Mara, H. (2016). Made in humanities: Dual integral in-

variants for efﬁcient edge detection. Journal on it –

Information Technology.

Mara, H. and Kr

omker, S. (2017). Visual Computing for

Archaeological Artifacts with Integral Invariant Fil-

ters in 3D. Eurographics Workshop on Graphics and

Cultural Heritage.

Mikolajczyk, K. and Schmid, C. (2005). A Performance

Recovering and Visualizing Deformation in 3D Aegean Sealings

465

Evaluation of Local Descriptors. Pattern Analysis and

Machine Intelligence.

Miller, G. (1994). Efﬁcient Algorithms for Local and

Global Accessibility Shading. Computer Graphics

and Interactive Techniques.

Mongus, D. and

Zalik, B. (2012). Parameter-free ground ﬁl-

tering of LiDAR data for automatic DTM generation.

Journal of Photogrammetry and Remote Sensing.

Razavian, A. S., Azizpour, H., Sullivan, J., and Carlsson,

S. (2014). CNN Features off-the-shelf: an Astound-

ing Baseline for Recognition. Computer Vision and

Pattern Recognition.

Rocco, I., Arandjelovi

c, R., and Sivic, J. (2017). Con-

volutional Neural network architecture for geometric

matching. Computer Vision and Pattern Recognition.

Tennakoon, R. B., Bab-Hadiashar, A., Suter, D., and Cao,

Z. (2013). Robust Data Modelling Using Thin Plate

Splines. Digital Image Computing: Techniques and

Applications.

Tola, E., Lepetit, V., and Fua, P. (2010). DAISY: An Ef-

ﬁcient Dense Descriptor Applied to Wide-Baseline

Stereo. Pattern Analysis and Machine Intelligence.

Tran, Q.-H., Chin, T.-J., adn M. S. Brown, G. C., and Suter,

D. (2012). In Defence of RANSAC for Outlier Re-

jection in Deformable Registration. European Con-

ference on Computer Vision.

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

466