standard deviation of the pixel values in the training
set. In some experiments, we also apply a random ro-
tation to the training images. This is done before the
random crop. Note that each image is accompanied
by a file that contains the coordinates of the diamond
centroid (see Sec. 5.2). These coordinates are trans-
formed along so that the polar transformation can be
correctly applied to the input batch. Polar warping
is performed after the data transformation pipeline.
Note that the coordinates of the centroid are trans-
formed along with the image so that the position of
the centroid does not change relative to the diamond.
6.2 Ablation Study
We explore the effect of adding the polar warping de-
scribed in Sec. 4, along with other hyperparameters.
Table 1 summarizes this ablation experiment. We re-
port the mAP after one epoch of training and after ten
epochs of training, averaged over 5 runs (mean ± std.
dev.). To compute the mAP, we first pass both the
query images and the gallery images (see Sec. 3.2)
through the model, obtaining an embedding for each
image. Then, we compute a similarity matrix between
the query embeddings and the gallery embeddings
from their cosine similarities. For every query, we
sort the similarities from most to least similar to the
query. From these sorted similarity sequences, along
with the ground truth query and gallery labels, we can
compute an AP for each query. The mAP is then ob-
tained by averaging all APs.
From Table 1, we can see that when the base-
line is trained with random resized cropping (Base-
line+RRC), the model performs significantly worse
than when we apply random cropping without ran-
dom resizing (Baseline). Note that random resized
cropping involves that the input images are first re-
sized to a random size, after which a random region
of a fixed size is cropped out. This data augmentation
technique is typically used to make a model invariant
to scale changes. However, due to our camera set-up,
the same diamond will always have the same scale in
the image and the model can safely use the scale of
a diamond as a descriptive feature. This is confirmed
by the drop of more than 10 percent points when we
add random resized cropping to the baseline.
A slight increase in mAP after 1 epoch of train-
ing is found when we replace ImageNet normaliza-
tion with the mean and standard deviation of the dia-
mond dataset itself (Baseline+Norm). The images in
our dataset contain a lot more dark areas than typi-
cal ImageNet (Deng et al., 2009) images. Therefore,
the mean red, green and blue pixel values are much
smaller than in ImageNet. After 10 epochs, however,
as the BatchNorm (Ioffe and Szegedy, 2015) layers in
the model adapted to the data distribution, the mAP
difference with the baseline becomes negligible.
The largest increase with respect to the baseline
is seen when the input is transformed with the polar
mapping presented in Sec. 4 (Baseline+Norm+Polar
and Baseline+Norm+Rot+Polar), with an addi-
tional increase when we add random rotations.
Note that these random rotations result in random
horizontal shifts of the diamond in the warped
image. During training, there were two individ-
ual runs—one Baseline+Norm+Polar model, one
Baseline+Norm+Rot+Polar—that achieved 100%
mAP. These models are able to find, for each of
17 480 query images of unseen diamonds, the 10 out
of 2050 gallery images of the same diamond. None
of the other methods had a run that performed so well
anytime during training.
This result greatly surpasses the result of
(De Feyter et al., 2019), who needed a kNN with k = 5
to finally achieve 100% mAP on their tiny dataset of
64 diamond classes.
As shown in Table 2, it takes about 1/3 longer
to train 10 epochs when polar transformation is per-
formed before passing the input to the model. How-
ever, from Table 1 we know that, after only a single
epoch, models with polar transformation already per-
form on par with non-polar methods trained for 10
epochs. So, in an mAP per time sense, the polar meth-
ods clearly outperform the non-polar methods.
6.3 Polar Warp Comparison
We measure the duration of our polar warp im-
plementation (see Sec. 4.2) under different settings
and compare it to the warpPolar() function of
OpenCV (Bradski, 2000). We select 16 random im-
ages from our dataset (size 2448 × 2048) and apply
a polar transformation using the detected centroid of
the diamond (see Sec. 5.2) as polar origin and a fixed
radius of 1024. As can be seen from Table 3, our Py-
Torch implementation runs about 1.2 times faster than
OpenCV on CPU (Intel Xeon E5-2630 v2) and more
than 200 times faster on GPU (NVIDIA GeForce
GTX 1180). When precomputing a base flow grid,
as presented in Sec. 4.2, our method runs even 750
times faster than OpenCV. The polar warps created
by OpenCV and by our own PolarTorch are visually
identical, as demonstrated in Fig. 8. When subtract-
ing the pixel values of outputs from both implemen-
tations, we found some differences, though, but we
consider these negligible.
Rotation Equivariance for Diamond Identification
121