Recovering and Visualizing Deformation
in 3D Aegean Sealings
Bartosz Bogacz
1
, Nikolas Papadimitriou
2
, Diamantis Panagiotopoulos
2
and Hubert Mara
1
1
Forensic Computational Geometry Laboratory, Heidelberg Univeristy, Germany
2
Corpus der minoischen und mykenischen Siegel, Heidelberg University, Germany
npapad2007@gmail.com, diamantis.panagiotopoulos@zaw.uni-heidelberg.de
Keywords:
Computational Reconstruction, Feature Extraction, Pattern Recognition, 3D Cultural Heritage.
Abstract:
Archaeological research into Aegean sealings and sigils reveals valuable insights into the Aegean socio-
political organization and administration. An important question arising is the determination of authorship
and origin of seals. The similarity of sealings is a key factor as it can indicate different seals with the same
depiction or the same seal imprinted by different persons. Analyses of authorship and workmanship require
the comparison of shared patterns and detection of differences between these artifacts. These are typically
performed qualitatively by manually discovering and observing shared visual traits. In our work, we quantify
and highlight visual differences, by exposing and directly matching shared features. Further, we visualize and
measure the deformation of shape necessary to match sigils. The sealings used in our dataset are 3D struc-
tured light scans of plasticine and latex molds of originals. We compute four different feature descriptors on
the projected surfaces and its curvature. Then, these features are matched with a rigid RANSAC estimation
before a non-rigid thin-plate spline (TPS) matching is performed to fine-tune the deformation. We evaluate
our approach by synthesizing artificial deformations on real world data and measuring the distance to the
re-constructed deformation.
1 INTRODUCTION
A seal is a small portable artifact mostly made of
stone but also found in other materials, such as
bone/ivory, metal, and various artificial pastes. It
displays engraved motifs and is generally perforated
so that it can be suspended e.g. on a necklace.
Seals have played an important role in Aegean so-
ciety and, as functional objects, they have served
three main purposes: securing, marking, and autho-
rizing. Their study provides important insight into
the Aegean socio-political organization and adminis-
tration
1
.
One important question arising in this study is the
determination of authorship and origin of seals. Given
two or more visually similar but not equal seals, pos-
sibly of different provenance, do they originate from
the same stamp or the same author? Research on seal-
ing artisan craftwork typically is concerned with qual-
itative judgments based on manually discovered and
1
Further introductory information can be found
at https://www.uni-heidelberg.de/fakultaeten/philosophie/
zaw/cms/seals/sealsAbout.html
observed traits. Our quantifiable results and visual-
izations can serve as foundations for evidence-based
reasoning. Thus, we address the challenge of ana-
lyzing patterns and deformations common to a pair
sealings.
The contributions of our work are (i) a pre-
processing and feature extraction pipeline for analyz-
ing the similarity and unique features of 3D scans of
sealings, (ii) the evaluation of this pipeline with syn-
thetic data for estimating deformations between seal-
ings, and (iii) the visualization of the deformation and
differences between sealings for Archaeological ap-
plications.
This paper is structured as follows: In Section 2
we present related work on the challenge to gen-
erate correspondences between sets of image pairs
of related but visually differing content. In Sec-
tion 3 we introduce our dataset and the necessary pre-
processing of the 3D data for further image process-
ing. Then, in Section 4 we describe four feature ex-
traction approaches to generate and weight correspon-
dence hypotheses. In Section 5, based on the pro-
posed correspondences, first a rigid mapping is esti-
Bogacz, B., Papadimitriou, N., Panagiotopoulos, D. and Mara, H.
Recovering and Visualizing Deformation in 3D Aegean Sealings.
DOI: 10.5220/0007385804570466
In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 457-466
ISBN: 978-989-758-354-4
Copyright
c
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
457
Figure 1: Three stages of preprocessing are shown from left to right. (i) The original scanned 3D data with texture and
light. (ii) The surface curvature of the seal 3D data rendered into a raster image. (iii) The seal image after local histogram
normalization and padding.
mated and hypotheses are filtered to match the esti-
mated model. On the remaining hypotheses a non-
rigid model is fine-tuned. In Section 6, we visualize
and describe the application of our results in an Ar-
chaeological context. A summary and conclusion is
given in Section 7.
2 RELATED WORK
Finding image correspondences and image registra-
tion is a challenging task usually approached by
research in medical settings (Holden, 2008), Geo-
sciences (Mongus and
ˇ
Zalik, 2012; Chen and Li,
2012; Tennakoon et al., 2013), and stereo match-
ing (Tola et al., 2010). In these cases, the images be-
ing compared depict the same scene and vary only in
small detail or pose. The large-scale content and iden-
tifying features are common to both images. How-
ever, in our setting, both sealing images differ in the
depicted content and correspondence metrics must
consider high-level semantic similarity.
In (Ham et al., 2016) an approach is presented
for correspondence matching and image deformation
based on a set of region proposals called Proposal
Flow. The authors proceed in two steps to match two
images. First, a set of multi-scale region proposals
are generated and matched. The matching consid-
ers the the spatial support of the region pair, i.e. its
geometric closeness, and its visual similarity, com-
puted with Spatial-Pyramid Matching (SPM) (Lazeb-
nik et al., 2006) or a convolutional neural network
(CNN) (Krizhevsky et al., 2012). The spatial support
is computed and weighted with neighboring matches
in mind, i.e. support is low if neighbors disagree. This
scheme enforces local smoothness of matches. Sec-
ondly, from the set of matches, the authors compute
a dense flow field for each pixel partaking in a cor-
respondence. The source image is then transformed
according to the flow field to match the target image.
In (Rocco et al., 2017) the authors enhance the
classical computer vision correspondence pipeline of
extracting descriptors and estimating a transform by
making it fully differentiable and end-to-end train-
able. The feature extraction is handled by cropping
the classification layer from a pre-trained CNN. Then,
the feature maps of two images processed by these
networks are correlated with a scalar product and
passed to regression CNN that estimates rigid trans-
formation parameters. This approach is repeated with
a second regression CNN that estimates non-rigid pa-
rameters for a thin-plate spline (TPS) based trans-
form. The main advantage of this approach is the abil-
ity for end-to-end training of the CNNs and the sin-
gle step estimation of the rigid and non-rigid transfor-
mation parameters, i.e. no optimization is performed
since the neural network predicts the parameters in
one forward pass.
A typical approach for robust estimation is
RANSAC. However, due to the adaptable complex-
ity of TPS models a minimal subset of points cannot
be defined, i.e. a minimal deformation can be found
for any pair of point-sets. However, in (Tran et al.,
2012) the authors show that one can reject outliers
reliably by fitting a hyperplane in the feature space
of correspondences. A separation exists between the
distribution of inliers and the distribution of outliers.
In our approach, due to the nature of our data,
i.e. historical artifacts, no training data is available.
While following the classical approach of extracting
descriptors, computing correspondences and estimat-
ing a transformation we also adopt the two-stage ap-
proach of Rocco et al. by first estimating a rigid trans-
formation and then fine-tuning with TPS.
3 DATASET
The CMS project (Corpus der minoischen und
mykenischen Siegel), with its archive of physical
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
458
sigils and sealings currently residing in Heidelberg,
Germany, systematically documents and publishes all
known ancient Aegean seals and sealings. The CMS
consists of an physical archive, published volumes,
and an open-access digital database
2
. The database
contains photos and tracings of seals and sealings
with associated manually added meta-data in line with
current Aegean glyptic research.
Within an interdisciplinary research project we are
gathering hundreds of sealings of a collection hold-
ing approx. 12.000. We make use of the physical
archive and the manufactured impressions of sealings
in plasticine, silicon, and gypsum. The impressions
of the sealings are their negative imprint. That way,
the motif and detail can be discerned more clearly.
The surface of the plasticine negatives, the sealing im-
pressions, are acquired with 800dpi resolution using a
structured-light 3D scanner. In this work we use the
word sealings and sigils interchangeably as our ap-
proach is applicable to any artifacts of similar shape.
3.1 Pre-processing
The resulting 3D data is processed in GigaMesh
3
by
computing the surface curvature of the impressions.
The computation proceeds using either Multi-Scale
Integral Invariants (MSII) (Mara and Kr
¨
omker, 2017;
Mara, 2016) or by Ambient Occlusion (AO) (Miller,
1994). While MSII provides better details for small-
scale surface features, for aligning sealing impression
we use AO which provides smoother and medium-
scale surface curvature.
The surface data augmented with surface curva-
ture is projected into a raster image of 400 ×600 pix-
els extents. We then apply local pixel histogram scal-
ing with a disk of 50 pixels, thus further emphasizing
medium-scale surface curvature. Finally, the images
are padded to provide the necessary space for subse-
quent deformation operations. The process from a 3D
sealing with texture data to a pre-processed sealing
raster image is shown in Figure 1.
4 CORRESPONDENCE
The underlaying assumption of our work is that
pairs of images under study, in our case impressions
of sealings, share visually similar and semantically
equivalent regions. If we then identify and align these
region pairs, the deformation that has been applied to
2
https://www.uni-heidelberg.de/fakultaeten/
philosophie/zaw/cms/databases/databasesfull.html
3
https://gigamesh.eu
one of the sealings will become apparent. This iden-
tification requires a discriminative distance metric be-
tween image patches. We evaluate four different dis-
tance metrics of increasing complexity.
In this work we define the left sealing image as the
sealing being deformed to match the right sealing im-
age. For reasons of brevity, in the following sections,
we only give definitions of the feature descriptors for
left sealing image
{D,Y,Z,V,N} f
i
and omit definitions for
the features of the right sealing image
{D,Y,Z,V,N}g
j
as
they are identical up to interchanged variables.
4.1 Direct Matching
In the baseline approach sub-regions are compared di-
rectly by pixel value using the Euclidean distance. For
the pair of sealing images under study I, J we gener-
ate two sets A, B of key-points a
i
, b
j
R
2
arranged in
a grid with 60 × 60 grid points overlaid on the seal-
ing images. At each key-point we extract sub-regions
of the sealing images, the patches p
i
, q
j
R
α×β
with
height α and width β, centered around the key-points.
By flattening these patches, we compute respective
feature vectors
D
f
i
,
D
g
j
R
αβ
. Then the distance
between the image patches
D
d
i j
is given by the Eu-
clidean distance between the image patches.
D
d
i j
= k
D
f
j
D
g
j
k (1)
Since our image data depicts local surface curva-
ture, the direct comparison of pixel values represents
a direct comparison of curvature values of the original
surfaces.
4.2 DAISY Descriptor
The DAISY image descriptor (Tola et al., 2010)
is a reformulation of the SIFT (Lowe, 2004) and
GLOH (Mikolajczyk and Schmid, 2005) descriptors
efficiently computable for each pixel in an image.
This is achieved by computing multi-scale histograms
of oriented gradients only once per image region shar-
ing these among neighboring pixels. Similar to the
direct comparison image patches, we extract DAISY
descriptors
Y
f
i
,
Y
g
j
R
δγ
with δ orientations and γ
rings, centered at the key-points a
i
, b
j
.
Y
d
i j
= k
Y
f
i
Y
g
j
k (2)
The distance between the feature descriptors is
computed using the Euclidean distance.
4.3 BOVW Descriptor
We aggregate locally bounded sets of DAISY fea-
tures into feature vectors to better capture and de-
Recovering and Visualizing Deformation in 3D Aegean Sealings
459
scribe repeating higher-order visual patterns. Such an
approach has been first introduced as Bag-of-Visual-
Words (BOVW) in (Fei-Fei and Perona, 2005).
We denote the union of both sets of DAISY de-
scriptors
Y
f
i
and
Y
g
j
as E = {
Y
f
i
} {
Y
g
j
}. These
descriptors are then quantized into the sets
Z
f
i
and
Z
g
j
. The quantization dictionary is computed with k-
means (MacQueen, 1967) which minimizes the fol-
lowing term to find a set of visual words v
k
V . The
size of the dictionary, i.e. the size of the set |V | is
chosen as κ.
argmin
V
vV
eE
ke vk (3)
Then, for each DAISY descriptor
Y
f
i
and
Y
g
j
we
find the closest visual work v
k
and record them in the
scalars
Z
f
i
,
Z
g
j
{1. . . κ}.
Z
f
i
= argmin
k
k
Y
f
i
v
k
k (4)
The radius of such a patch of visual words cen-
tered at a key-point is given by θ.
V
f
i
=
|{
Z
f
i
= 1 ka
i
ak < θ | a A}|
.
.
.
|{
Z
f
i
= κ ka
i
ak < θ | a A}|
(5)
The BOVW feature vectors
V
f
i
,
V
g
j
N
κ
count
the occurrences of visual words
Z
f
i
,
Z
g
j
close to a
key-point a
i
, b
j
.
V
d
i j
= k
V
f
i
V
g
j
k (6)
Comparison is performed using the Euclidean dis-
tance between the counts of visual words.
4.4 DenseNet Descriptor
To capture abstract visual concepts and meaningful
objects in the sealing images, e.g. heads, arms, or
legs, we employ CNNs. A CNN captures increas-
ingly abstract concepts and patterns that are used to
describe and classify an image.
An approach to use CNNs as feature extractors
without a labeled dataset for the target domain is in-
troduced in (Razavian et al., 2014). By pre-training
the CNN with a common large natural image dataset
with ground-truth labels, common visual patterns are
learned. The authors make use of this high-level pat-
tern description as means for embedding images into
a feature-space by removing the last fully-connected
classification layer.
Given all but the last fully-connected layer of
a network, which we denote as the function W :
R
H×W
R
1000
. The dimensionality of the target do-
main R
1000
is the count of classes used in the training
dataset. The features
N
f
i
R
1000
of the image patch
p
i
, q
j
are computed as follows:
N
f
i
= W (p
i
) (7)
Then, the features vectors of two image patches
are compared using the Euclidean distance.
N
d
i j
= k
N
f
i
N
g
j
k (8)
The network W uses the DenseNet architecture
(Huang et al., 2017) trained on the ImageNet (Deng
et al., 2009) dataset.
4.5 Evaluation
For our purposes the suitability of a image descriptor
is given by two properties: (i) the correctness of se-
mantic equivalence, e.g. a point on the horse head on
the left sealing is assigned to a point on the horse head
on the right sealing, and (ii) by the uniqueness of the
assignment, i.e. a point of the left image is assigned
to dense unimodal concentration on the right.
We visually evaluate the performance of the four
presented image patch comparison methods by sam-
pling points-of-interest from the left sealing and visu-
alizing the descriptor distances on the right sealing as
shown in Figure 2.
We experimented with different values for image
patch sizes, DAISY parameters and counts of visual
words. The following values were chosen based on
the quantitative evaluation performed in Section 5.3.
We used a image patch size of α = 128, β = 128
pixels for all descriptors but the DenseNet based ap-
proach, which used the architecture native patch size
of 224× 224 pixels. For both of the DAISY based de-
scriptors we used δ = 8 orientations and γ = 4 rings
with a radius of 100 pixels for the pure DAISY de-
scriptor and 30 pixels for BOVW descriptor. Finally,
we used clustered the DAISY descriptors into κ = 512
distinct visual words for the BOVW description.
Based on Figure 2, we see that the direct patch
comparison (a) performs well when large-scale fea-
tures are compared, i.e. the back of a horse, but fails at
uniquely identifying features with fine detail, such as
the head of a horse. Since pixel values are compared
directly, even small misalignments cause significant
differences between two image patches. The DAISY
feature descriptor (b) and the BOVW feature descrip-
tor (c), are robust against small offsets and perform
significantly better with small and large feature struc-
tures. Finally, the CNN (d) feature descriptor does
neither capture semantics of the image well nor is the
assignment concentrated in one place. We assume
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
460
Figure 2: Point query responses of the four presented de-
scriptors. Given a point on a left sealing image, points on
the right sealing image are shown that are closest in feature
space. Brighter colors indicate less distance, i.e. more simi-
larity. From top to bottom: Original image with query point
in red, (i) Euclidean distance of patches, (ii) DAISY feature
descriptor, (iii) Bag-of-Visual-Words, and (iv) DenseNet
pre-trained on ImageNet.
that our dataset, patches of a filtered and normalized
surface curvature of a mesh, are significantly outside
the distribution of the training dataset and the learned
visual patterns are therefore not transferable.
5 ALIGNMENT
After computing the visual distance between image
patches, or circular regions in the case of BOVW,
at grid key-points a
i
and b
j
, we have to find unique
correspondences. We compute the matches using a
greedy approach by determining for each left feature
vector f
i
the visually closest right feature vector g
j
and vice versa. If either disagrees, such a match is
discarded. Then, assuming a generic distance metric
of features descriptors d
i j
, the set of matches is de-
noted by m
i j
{0, 1} and is computed as follows.
m
i j
=
1 if argmin
i
d
i j
= argmin
j
d
i j
0 otherwise
(9)
For each match in m
i j
the left visual descriptor is
closest to its right counterpart and vice versa.
5.1 Rigid Pre-alignment
We align the sealing images in two stages. First, a
rigid alignment model is estimated and only corre-
spondences matching that model within a small error
ε are kept. In a second stage, a non-rigid fine-tuning
fits the remaining correspondences to minimize the
previously allowed error.
We estimate the rigid transformation model using
the RANSAC algorithm (Fischler and Bolles, 1980).
Let R R
2×2
denote a rotation matrix in Euclidean
space and let v R
2
denote a translation. Then, the
RANSAC algorithm minimizes the squared error be-
tween the correspondences m
i j
of the key-point sets
a
i
and b
j
.
min
(R,t)
M
i
N
j
m
i j
k(Ra
i
+ v) b
j
k (10)
Given the rigid transformation model (R, t) esti-
mated with RANSAC, we discard correspondences
m
i j
that do not fit the estimated model outside the
aforementioned threshold ε.
¯m
i j
=
(
m
i j
if k(Ra
i
+ v) b
j
k < ε
0 otherwise
(11)
The remaining correspondences ¯m
i j
are used for
the subsequent non-rigid fine-tuning. With ¯a
i
¯
A and
¯
b
j
¯
B we denote grid points which are only part of
the set of remaining correspondeces, a subset
¯
A A
and
¯
B B of the original points.
5.2 Non-rigid Fine-tuning
The non-rigid transformation is based on a direct es-
timation of a thin-plate spline model closely follow-
ing the approach presented in (Chui and Rangarajan,
2003). However, since we already computed corre-
spondences ¯m
i j
, we skip the expectation maximiza-
tion procedure and skip outlier detection to directly
estimate the model.
Recovering and Visualizing Deformation in 3D Aegean Sealings
461
Figure 3: Predicted deformation visualized with a colored
grid on the deformed left sealing image. Hotter colors on
the deformation grid indicate a higher amount of compres-
sion and cooler colors indicate a higher amount of stretch-
ing. Then, deformed left and original right sealing images
overlaid with 50% opacity.
The TPS model minimizes the following equa-
tion to find a transfer function t
d,w
: R
3
R
3
to map
matching source key-points a
i
to target key-points b
j
.
The operator L follows the definition in the work of
Chui et al., it is a double integral of the square of the
second order derivatives of the mapping function, and
is used as regularization. The parameter λ controls the
strengths of the regularization and enforces the cho-
sen smoothness.
min
t
d,w
i
j
k ¯a
i
s
d,w
(
¯
b
j
)k + λkLt
d,w
k (12)
We follow the procedure outlined in Chui et
al. work to minimize the term an compute the param-
eters d and w. The parameter d R
3×3
denotes a
rigid transform matrix in homogeneous coordinates,
the parameter w R
M×3
denotes the non-rigid spline
transformation parameters. The TPS kernel Φ con-
tains information of the spatial relationships of the
control points, the left grid points of the remaining
correspondences.
Φ = k ¯a
i
¯a
j
k
2
logk ¯a
i
¯a
j
k (13)
The mapping function for some homogenous
point x R
3
is written as follows:
t
d,w
(x) = x d + Φ w (14)
Given the computed parameters d and w, we now
can transform pixel positions of the left sealing im-
age to semantically meaningful positions in the right
sealing image.
Figure 4: Mean distance between the vertices of the syn-
thesized deformation grid and the respective vertices of the
predicted deformation grid.
Figure 5: Mean pixel-wise difference between synthesized
deformed image, i.e. the target image in the evaluation test,
and the estimated deformation applied to the source image.
5.3 Evaluation
The nature of the dataset and research question posed
in this work does not admit to a quantifiable ground-
truth of expected alignments. Nevertheless, to be able
to draw conclusions from our research, we evaluate
the precision of our deformation estimation approach
by synthesizing artificial deformations.
Deformation synthesis is based on the same TPS
modeling as our estimation procedure. We manually
create a grid of 12 points on the left sealing image
which are randomly perturbed. We chose a low count
of points, since natural deformation of sealings, e.g.
due to heat or mis-handling, are large-scale.
Let c
i
denote a manually chosen grid point. Then
˜c
i
is the respective point perturbed by a uniform dis-
tribution in [50, 50]. Using these sets of points, we
estimate a synthetic transform (d, w). The pixel posi-
tions of the right sealing image are computed by sam-
pling the right sealing image on the position indicated
by the transfer function.
We evaluate our approach by comparing the align-
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
462
Figure 6: Left: Synthesized expected deformation grid.
Right: Estimated deformation grid based on the presented
approach using Bag-of-Visual-Words (BOVW) feature ex-
traction. Hotter colors indicate compression, cooler colors
indicate stretching.
ment predicted by our approach to the synthesized
ground-truth alignment. We set the non-rigid defor-
mation regularization to λ = 1000 and the allowed
rigid model error to ε = 30. Other values, such as
λ [1, 100, 10000] and ε [10, 100] resulted in worse
performance. Given the initial grid of key-points a
i
of
the left sealing and the predicted transformation pa-
rameters (
ˆ
d, ˆw), we compute the deformed grid ˆa
i
.
ˆa
i
= t
ˆ
d, ˆw
(a
i
) (15)
Then, the error
grid
E to the expected transforma-
tion ˜a, the artificially synthesized transformation, is
computed by the sum of euclidean distances between
the respective grid points.
grid
E =
1
|A|
i
j
k ˆa
i
˜a
i
k (16)
We compare the prediction performance of the
presented feature extractors and alignment stages,
rigid and non-rigid, by comparing their prediction er-
ror to the expected transformation. A lower prediction
error indicates a better prediction performance.
Additionally, we consider the sum of pixel level
differences
pixel
E between the deformed left sealing
image I and the original right sealing image J. The
set X , with |X| the count of elements in the set, gives
the pixel positions common to both images, i.e. when
both images are overlaid on top of each other as
shown in Figure 3.
pixel
E =
1
|X|
xX
|I(t
ˆ
d, ˆw
(x)) J(x)| (17)
We consider the error to be predictive of alignment
performance. The pixel values of the sealing images
denote the local curvature of the original 3D scanned
surface, they do not represent texture and lighting.
Therefore, no undue biases are introduced by such
a comparison. Figure 4 and Figure 5 compares the
prediction performances over 10 repeated synthesized
deformations of the presented descriptors.
Figure 7: Vector field plot of the differences between the
synthesized expected deformation grid and the estimated
deformation grid. Red arrows are positioned on estimated
grid points and point toward expected grid points. Most of
the prediction error is concentrated outside of the sealing
image, where no correspondences are present for a defor-
mation estimation.
5.4 Visualization
In addition to the evaluation metrics, we visualize the
computed transformations for visual inspection. In
particular, we are interested in the non-rigid deforma-
tion of the transformation. We transform a regular
grid of points with the predicted (d, w) and colorize
the grid points by the difference of distances e
i
to their
neighbors, weighted by the Gaussian distribution, be-
fore and after transformation.
e
i
=
i
j
(k ˆa
j
ˆa
i
k ka
j
a
i
k)e
−ka
j
a
i
k
σ
(18)
The value σ controls the fuzziness of the coloring
of the deformations and is set to σ = 0.01. The result-
ing visualization is shown in Figure 6. We also visu-
alize the difference between the predicted grid points
and the expected grid points using a quiver plot. At
each grid point of the predicted grid an arrow points
in the direction of the respective expected grid point,
as shown in Figure 7.
6 RESULTS
Our work provides two directly applicable results for
historical research: (i) the grid visualizations of de-
formation, as shown in Figure 3, necessary to align
two seal images in either direction, (ii) overlays of de-
formed and target seal images, as shown in Figure 8
and in Figure 9, highlighting concrete differences.
Recovering and Visualizing Deformation in 3D Aegean Sealings
463
Figure 8: Visualization of pixel-wise differences for both
deformation directions. For both configurations of left seal-
ing and right sealing, the deformed left sealing image has
been overlaid on top of the right original sealing image.
Brighter colors indicate higher pixel-wise differences.
Our overlay visualizations provide a smooth and
natural, i.e. differentiable and regularized deforma-
tion, embedding of a deformed seal image over the
target image. Medium-scale differences can be easily
spotted and further investigated. The overlay provides
a view uncluttered from non-informative small differ-
ences due to scanning noise and damage.
If the characteristic influences of damage or man-
ufacturing handicraft to the shape of the seals and the
resulting deformations are known, e.g. by examples
or by real-world experiments, the deformations esti-
mated in our approach serves as quantitative evidence
substantiating expert hypotheses.
7 SUMMARY & OUTLOOK
Our work provides an approach for the need of quan-
tifiable research results and evidence gathering in the
analysis of authorship and origin of Aegean seals and
sealings. In particular, we are interested in com-
monalities and differences in shape and expressions
of visual features. However, due to different man-
ufacturing technique or damage to the material, the
artifacts may be deformed. We approach this chal-
lenge by 3D scanning the seals and sealings and pre-
processing the data to arrive at a high-contrast 2D
images of the surface curvature. Then, we evalu-
ate four visual feature descriptors, direct compari-
son of image patches, Bag-of-Visual-Words (BoVW),
DAISY descriptors and pre-trained DenseNet, with
respect to their ability to find semantically meaningful
correspondences. Given the set of correspondences,
we first estimate a rigid transform with RANSAC
and proceed with fine-tuning residual alignment error
with Thin-plate splines (TPS). We evaluate the accu-
racy of the alignment by generating synthetic defor-
mations of our dataset and comparing the expected
deformation to the estimated deformation. Thus, we
avoid the need for ground-truth that is often not avail-
able for historical artifacts. Finally, we visualize our
results with a grid of the estimated deformation and
colored by its curvature and an overlay highlighting
differences of aligned images.
In future work, we are interested in physically
manufacturing ground-truth of characteristic defor-
mations by creating copies of chosen seals and seal-
ings and then intentionally damaging and deforming
those. This would allow us to analyze and then au-
tomatically classify the type of damage inflicted, by
estimating the induced deformations and comparing
these to the manufactured ones. Further, the presented
pipeline is amendable to a fully 3D setting, where pro-
jecting into 2D raster images is unnecessary, by us-
ing 3D surface feature descriptors (Bogacz and Mara,
2018; Fey et al., 2018).
ACKNOWLEDGMENTS
We genuinely thank and greatly appreciate the efforts
of Maria Anastasiadou supervising the “Corpus der
minoischen und mykenischen Siegel” (CMS). We sin-
cerely thank Markus K
¨
uhn for his contributions to our
tooling. Katharina Anders for her feedback on related
work, and ZuK 5.4 and BMBF eHeritage II for par-
tially funding this work.
REFERENCES
Bogacz, B. and Mara, H. (2018). Feature Descriptors for
Spotting 3D Characters on Triangular Meshes. In-
ternational Conference on Frontiers in Handwriting
Recognition.
Chen, C. and Li, Y. (2012). A robust method of thin plate
spline and its application to DEM construction. Com-
puters & Geosciences.
Chui, H. and Rangarajan, A. (2003). A new point matching
algorithm for non-rigid registration. Computer Vision
and Image Understanding.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). ImageNet: A Large-Scale Hierarchical
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
464
Figure 9: Pairwise comparison of four supposedly identical sealings in our dataset. Comparison is not symmetric as in one
case the left sealing is being deformed and in the other case the top sealing is being deformed. Shown are additive sealing
overlays where the left sealing is deformed to match the top sealing. Subject pose is therefore constant in columns and varying
in rows. The feature descriptor used is DAISY. Mean of pixel-wise differences, with each pixel in [0, 255], is given on the
bottom left of the respective overlaid sealing pair.
Image Database. Computer Vision and Pattern Recog-
nition.
Fei-Fei, L. and Perona, P. (2005). A Bayesian Hierarchical
Model for Learning Natural Scene Categories. Com-
puter Vision and Pattern Recognition.
Fey, M., Lenssen, J. E., Weichert, F., and M
¨
uller, H. (2018).
SplineCNN: Fast Geometric Deep Learning with Con-
tinous B-Spline Kernels. Computer Vision and Pattern
Recognition.
Fischler, M. A. and Bolles, R. C. (1980). Random Sample
Consensus: A Paradigm for Model Fitting with Ap-
plications to Image Analysis and Automated Cartog-
raphy. SRI International.
Ham, B., Cho, M., Schmid, C., and Ponce, J. (2016). Pro-
posal Flow. Computer Vision and Pattern Recogni-
tion.
Holden, M. (2008). A Review of Geometric Transforma-
tions for Nonrigid Body Registration. Transactions
on Medical Imaging.
Huang, G., Liu, Z., van der Maaten, L., and Weinberger,
K. Q. (2017). Densely Connected Convolutional Net-
works. Computer Vision and Pattern Recognition.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
ageNet Classification with Deep Convolutional Neu-
ral Networks. Neural Information Processing Sys-
tems.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond
Bags of Features: Spatial Pyramid Matching for Rec-
ognizing Natural Scene Categories. Computer Society
Conference on Computer Vision and Pattern Recogni-
tion.
Lowe, D. (2004). Distinctive Image Features from Scale
Invariant Keypoints. Computer Vision.
MacQueen, J. B. (1967). Some Methods for Classification
and Analysis of Multivariate Observations. Berkeley
Symposium on Mathematical Statistics and Probabil-
ity.
Mara, H. (2016). Made in humanities: Dual integral in-
variants for efficient edge detection. Journal on it
Information Technology.
Mara, H. and Kr
¨
omker, S. (2017). Visual Computing for
Archaeological Artifacts with Integral Invariant Fil-
ters in 3D. Eurographics Workshop on Graphics and
Cultural Heritage.
Mikolajczyk, K. and Schmid, C. (2005). A Performance
Recovering and Visualizing Deformation in 3D Aegean Sealings
465
Evaluation of Local Descriptors. Pattern Analysis and
Machine Intelligence.
Miller, G. (1994). Efficient Algorithms for Local and
Global Accessibility Shading. Computer Graphics
and Interactive Techniques.
Mongus, D. and
ˇ
Zalik, B. (2012). Parameter-free ground fil-
tering of LiDAR data for automatic DTM generation.
Journal of Photogrammetry and Remote Sensing.
Razavian, A. S., Azizpour, H., Sullivan, J., and Carlsson,
S. (2014). CNN Features off-the-shelf: an Astound-
ing Baseline for Recognition. Computer Vision and
Pattern Recognition.
Rocco, I., Arandjelovi
´
c, R., and Sivic, J. (2017). Con-
volutional Neural network architecture for geometric
matching. Computer Vision and Pattern Recognition.
Tennakoon, R. B., Bab-Hadiashar, A., Suter, D., and Cao,
Z. (2013). Robust Data Modelling Using Thin Plate
Splines. Digital Image Computing: Techniques and
Applications.
Tola, E., Lepetit, V., and Fua, P. (2010). DAISY: An Ef-
ficient Dense Descriptor Applied to Wide-Baseline
Stereo. Pattern Analysis and Machine Intelligence.
Tran, Q.-H., Chin, T.-J., adn M. S. Brown, G. C., and Suter,
D. (2012). In Defence of RANSAC for Outlier Re-
jection in Deformable Registration. European Con-
ference on Computer Vision.
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
466